Optimizing Bulk Indexing in elasticsearch - performance

We have an elastic search cluster of 3 nodes of the following configurations
#Cpu Cores Memory(GB) Disk(GB) IO Performance
36 244.0 48000 very high
The machines are in 3 different zones namely eu-west-1c,eu-west-1a,eu-west-1b.
Each elastic search instance is being allocated 30GB of heap space.
we are using the above cluster for running aggregations only. The cluster has replication factor of 1 and all the string fields are not analyzed , doc_values is true for all the fields.
We are pumping data into this cluster running 6 instances of logstash in parallel ( having a batch size of 1000)
When more instances of logstash are started one by one the nodes of the ElasticSearch cluster starts throwing out of memory error.
What could be the possible optimizations to speed up bulk indexing rate on the cluster?= Will presence of nodes of cluster in the same zone increase bulk indexing? Will adding more nodes in the cluster help ?
Couple of steps taken so far
Increase the bulk queue size from 50 to 1000
Increase refresh interval from 1 seconds to 2 minutes
Changed segments merge throttling to none (
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing- performance.html)
We cannot set the replication factor to 0 due to inconsistency involved if one of the nodes goes down.

Related

Elasticsearch Memory Usage increases over time and is at 100%

I see that indexing performance degraded over a period of time in Elasticsearch. I see that the mem usage has slowly increased over a period of time until it became 100%. At this state I cannot index any more data. I have default shard settings - 5 primary and 1 replica. My index is time based with index created every hour to store coral service logs of various teams. An index size corresponds to about 3GB with 5 shards and with replica it is about 6GB. With a single shard and 0 replicas it comes to about 1.7 GB.
I am using ec2's i2.2x large hosts which offer 1.6TB space and 61GB RAM and 8 cores.
I have set the heap size to 30GB.
Following is node statistics:
https://jpst.it/1eznd
Could you please help in fixing this? My whole cluster came down that I had to delete all the indices.

Indexing multiple indexes in elastic search at the same time

I am using logstash for ETL purpose and have 3 indexes in the Elastic search .Can I insert documents into my 3 indexes through 3 different logtash processes at the same time to improve the parallelization or should I insert documents into 1 index at a time.
My elastic search cluster configuration looks like:
3 data nodes
1 client node
3 data nodes - 64 GB RAM, SSD Disk
1 client node - 8 GB RAM
Shards - 20 Shards
Replica - 1
Thanks
As always it depends. The distribution concept of Elasticsearch is based on shards. Since the shards of an index live on different nodes, you are automatically spreading the load.
However, if Logstash is your bottleneck, you might gain performance from running multiple processes. Though if running multiple LS process on a single machine will make a positive impact is doubtful.
Short answer: Parallelising over 3 indexes won't make much sense, but if Logstash is your bottleneck, it might make sense to run those in parallel (on different machines).
PS: The biggest performance improvement generally is batching requests together, but Logstash does that by default.

Elasticsearch 1.5.2 High JVM Heap in 2 nodes even without bulk indexing

We have been facing multiple downtimes recently, especially after few hours of Bulk Indexing. To avoid further downtimes, we disabled bulk indexing temporarily and added another node. Now downtimes have stopped but two out 6 nodes permanently remain at JVM Heap > 80%
We have a 6 node cluster currently (previously 5), each being EC2 c3.2xlarge with 16 GB ram, 8 GB of JVM heap, all master+data. We're using Elasticsearch 1.5.2 which has known issues like [OOM Thrown on merge thread](https://issues.apache.org/jira/browse/LUCENE-6670 OOM Thrown on merge thread), and we faced the same regularly.
There are two major indices used frequently for Search and autosuggest having doc count/size as follows:
health status index pri rep docs.count docs.deleted store.size
green open aggregations 5 0 16507117 3653185 46.2gb
green open index_v10 5 0 3445495 693572 44.8gb
Ideally, we keep at least one replica for each index, but our last attempt to add replica with 5 nodes resulted in OOM errors and full heap, so we turned it back to 0.
We also had two bulk update jobs running between 12-6 AM, each updating about 3 million docs with 2-3 fields on a daily basis. They were scheduled at 1.30 AM and 4.30 AM, each sending bulk feed with 100 docs (about 12 KB in size) to BULK api using a bash script having a sleep time of .25s between each to avoid too many parallel requests. When we started the bulk update, we had max 2 million docs to update daily, but the doc count almost doubled in a short span (to 3.8 million) and we started seeing Search response time spikes mostly between 4-6 AM and sometimes even later. Our average Search response time also increased from 60-70 ms to 150+ ms. A week ago, master left due to ping timeout, and soon after that we received shard failed error for one index. On investigating further, we found that this specific shard's data was inaccessible. To save unavailability of data, we restarted the node and reindexed the data.
However, the node downtime happened many more times, and each time Shards went into UNASSIGNED or INITIALIZING state. We finally deleted the index and started fresh. But heavy indexing again brought OutOfMemory Errors and node downtime, with same shard issue and data loss. To avoid further downtimes, we stopped all bulk jobs and reindexed data at a very slow rate.
We also added one more node to distribute load. Yet, currently we have 3 nodes with JVM constantly above 75+%, 2 being 80+ always. We have noticed that number of segments and their size is relatively high on these nodes (about 5 GB), but using optimize index on these would risk increasing heap again, with a probability of downtime.
Another important point to note is that our tomcat apps hit only 3 of all nodes (for normal search and indexing), and mostly one of the other two node was used for bulk indexing. Thus, out of three query+indexing receiving node, one node, and the left node for bulk indexing has relatively high heap.
There are following known issues with our configuration and indexing approach, which are planning to fix:
Bulk indexing hits only one node, thus increasing its heap, and causes slightly high GC pauses.
mlockall is set to false
Snapshot is needed to revert index in such cases, we're under planning phase when this incident happened.
We can merge 2 bulk jobs into one, to avoid too indexing request under queue at the same time.
We can use optimize API at regular interval in the bulk indexing script to avoid existence of too many segments.
Elasticsearch yml: (only relevant and enabled settings mentioned)
master: true
index.number_of_shards: 5
index.number_of_replicas: 2
path.conf: /etc/elasticsearch
path.data: /data
transport.tcp.port: 9300
transport.tcp.compress: false
http.port: 9200
http.enabled: true
gateway.type: local
gateway.recover_after_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_nodes: 3
discovery.zen.minimum_master_nodes: 4 # Now that we have 6 nodes
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
Node stats:
Pastebin link
Hot threads:
Pastebin link
If I understand well you have 6 servers, each of one of them is running elasticsearch node.
What I would to is to run more than one node on each server separating the roles, node that act as client, node that acts as data node, and node that act as master. I think that you can have two nodes on each server.
3 servers: data + client
3 servers: data + master
The client nodes and the master nodes will need less amount of RAM. The configuration files will be more complex but it will work better.

How to max out CPU cores on Elasticsearch cluster

How many shards and replicas do I have to set to use every cpu core (I want 100% load, fastest query results) in my cluster?
I want to use Elasticsearch for aggregations. I read that Elasticsearch uses multiple cpu cores, but found no exact details about cpu cores regarding sharding and replicas.
My observations are, that a single shard does not use more than 1 core/thread at query time (considerung there is only one query at a time). With replicas, the query of a 1-shard index are not faster, since Elasticsearch does not seem to use the other nodes to distribute the load on a shard.
My questions (one query at a time):
A shard does not use more than one cpu core?
Shards must always be scanned completely, replicas cannot be used to divide intra-shard load between nodes?
The formular for best performance is SUM(CPU_CORES per node) * PRIMARY_SHARDS?
When doing an operation (indexing, searching, bulk indexing etc) a shard on a node uses one thread of execution, meaning one CPU core.
If you have one query running at a given moment, that will use one CPU core per shard. For example, a three node cluster with a single index that has 6 primary shards and one replica, will have in total 12 shards, 4 shards on each node.
If there is only one query running on the cluster, for that index, ES will query all the 6 shards of the index (no matter if they are primaries or replicas) and each node will use between 0 and 4 CPU cores for the job, because the round-robin algorithm used by ES to choose which copy of a shard performs the search can choose no shards on one node or maximum 4 shards on one node.

Why is there unequal CPU usage as elasticsearch cluster scales?

I have a 15 node elasticsearch cluster and am indexing a lot of documents. The documents are of the form { "message": "some sentences" }. When I had a 9 node cluster, I could get CPU utilization upto 80% on all of them, when I turned it into a 15 node cluster, i get 90% CPU usage on 4 nodes and only ~50% on the rest.
The specification of the cluster is:
15 Nodes c4.2xlarge EC2 insatnces
15 shards, no replicas
There is load balancer in-front of all the instances and the instances are accessed through the load balancer.
Marvel is running and is used to monitor the cluster
Refresh interval 1s
I could index 50k docs/sec on 9 nodes and only 70k docs/sec on 15 nodes. Shouldn't I be able to do more?
I'm not yet an expert on scalability and load balancing in ES but some things to consider :
load balancing should be native in ES thus having a load balancer in-front can actually mitigate the in-house load balancing results. It's kind of like having a speed limitation on your car but manually using the brakes, it doesn't make that much sense since your speed limitator should already do the job and will be prevented from doing it right when you input "manual regulation". Have you tried not using your load balancer and just using the native load balancing to see how it fares ?
while having more CPU / computation power across different servers / shards, it also forces you to go through multiple shards every time you write/read a document, thus if 1 shard can do N computations, M shards won't actually be able to do M*N computations
having 15 shards is probably overkill in a lot of cases
having 15 shards but no replication is weird/bad since if any of your 15 servers falls, you won't be able to access your whole index
you can actually hold multiple nodes on a single server
What is your index size in terms of storage ?

Resources