Elasticsearch cluster - elasticsearch

I have ES cluster with 5 machines.
One of that machines is always using more resources than other for instance now i see that average load is CPU 7%, Memory 65,
But i have node4 which is strange because it using 30% of CPU and 86% of memory.
Machines are totally the same, configuration the same only node4 is only data node. And when I compare node4 with other in marvel they are doing almost the same tasks..
Any suggestion how to debug and see why its using more than other?
PS. Reason why i care because few times my cluster dies because of node4, I did some improvements in app, but still i want to understand what is going on with node4.

Two things about your cluster:
This is wrong: "all requests are sent to master (node1, node2)"! You should send the requests in round robin fashion to all the nodes holding data, otherwise you
'll have nodes that simply do more work than others
You are wasting memory and overall resources by having a lot of small shards... You should consider moving to 1 primary and 1 replica for your indices. The default (5 primaries, 1 replica) is too much. Your indices are way too small to have 5 shards.

Related

Controlling where shards are allocated

My setup:
two zoness: fast and slow with 5 nodes each.
fast nodes have ephemeral storage, whereas the slow nodes are NFS based.
Running Elasticsearch OSS v7.7.1. (I have no control over the version)
I have the following cluster setting: cluster.routing.allocation.awareness.attributes: zone
My index has 2 replicas, so 3 shard instances (1x primary, 2x replica)
I am trying to ensure the following:
1 of the 3 shard instances to be located in zone fast.
2 of the 3 shard instances to be located in zone slow (because it has persistent storage)
Queries to be run in shard in zone fast where available.
Inserts to only return as written once its written once its been replicated.
Is this setup possible?
Link to a related question: How do I control where my primary and replica shards are located?
EDIT to add extra information:
Both fast and slow nodes run on a PaaS offering where we are not in control of hardware restarts meaning there can technically be non-graceful shutdowns/restarts at any point.
I'm worried about unflushed data and/or index corruption so I am looking for multiple replicas to be on the slow zone nodes backed by NFS to reduce the likelihood of data loss, despite the fact that this will "overload" the slow zone with redundant data.

Configuring Elastic Search cluster with machines of different capacity(CPU, RAM) for rolling upgrades

Due to cost restrictions, I only have the following types of machines at disposal for setting up an ES cluster.
Node A: Lean(w.r.t. CPU, RAM) Instance
Node B: Beefy(w.r.t. CPU,RAM) Instance
Node M: "Leaner than A"(w.r.t. CPU, RAM) Instance
Disk-wise, both A and B have the same size.
My plan is to set up Node A and Node B acting as Master Eligible, Data node and Node M as Master-Eligible Only node(no data storing).
Because the two data nodes are NOT identical, what would be the implications?
I am going to make it a cluster of 3 machines only for the possibility of Rolling Upgrades(current volume of data and expected growth for few years can be managed with vertical scaling and leaving the default no. of shards and replica would enable me to scale horizontally if there is a need)
There is absolutely no need for your machines to have the same specs. You will need 3 master-eligible nodes not just for rolling-upgrades, but for high availability in general.
If you want to scale horizontally you can do so by either creating more indices to hold your data, or configure your index to have multiple primary and or replica shards. Since version 7 the default for new indices is to get created with 1 primary and 1 replica shard. A single index like this does not really allow you to schedule horizontally.
Update:
With respect to load and shard allocation (where to put data), Elasticsearch by default will simply consider the amount of storage available. When you start up an instance of Elasticsearch, it introspects the hardware and configures its threadpools (number of threads & size of queue) for various tasks accordingly. So the number of available threads to process tasks can vary. If I‘m not mistaken the coordinating node (the node receiving the external request) will distribute indexing/write requests in a round-robin fashion, not taking a load into consideration. Depending on your version of Elasticsearch, this is different for search/read requests where the coordinating node will leverage adaptive replica selection, taking into account the load/response time of the various replicas when distributing requests.
Besides this, sizing and scaling is a too complex topic to be answered comprehensively in a simple response. It typically also involves testing to figure out the limits/boundaries of a single node.
BTW: the number of default primary shards got changed in v7.x of Elasticsearch, as too much oversharding was one of the most common issues Elasticsearch users were facing. A “reasonable” shard size is in the tens of Gigabytes.

Elasticsearch 1.5.2 High JVM Heap in 2 nodes even without bulk indexing

We have been facing multiple downtimes recently, especially after few hours of Bulk Indexing. To avoid further downtimes, we disabled bulk indexing temporarily and added another node. Now downtimes have stopped but two out 6 nodes permanently remain at JVM Heap > 80%
We have a 6 node cluster currently (previously 5), each being EC2 c3.2xlarge with 16 GB ram, 8 GB of JVM heap, all master+data. We're using Elasticsearch 1.5.2 which has known issues like [OOM Thrown on merge thread](https://issues.apache.org/jira/browse/LUCENE-6670 OOM Thrown on merge thread), and we faced the same regularly.
There are two major indices used frequently for Search and autosuggest having doc count/size as follows:
health status index pri rep docs.count docs.deleted store.size
green open aggregations 5 0 16507117 3653185 46.2gb
green open index_v10 5 0 3445495 693572 44.8gb
Ideally, we keep at least one replica for each index, but our last attempt to add replica with 5 nodes resulted in OOM errors and full heap, so we turned it back to 0.
We also had two bulk update jobs running between 12-6 AM, each updating about 3 million docs with 2-3 fields on a daily basis. They were scheduled at 1.30 AM and 4.30 AM, each sending bulk feed with 100 docs (about 12 KB in size) to BULK api using a bash script having a sleep time of .25s between each to avoid too many parallel requests. When we started the bulk update, we had max 2 million docs to update daily, but the doc count almost doubled in a short span (to 3.8 million) and we started seeing Search response time spikes mostly between 4-6 AM and sometimes even later. Our average Search response time also increased from 60-70 ms to 150+ ms. A week ago, master left due to ping timeout, and soon after that we received shard failed error for one index. On investigating further, we found that this specific shard's data was inaccessible. To save unavailability of data, we restarted the node and reindexed the data.
However, the node downtime happened many more times, and each time Shards went into UNASSIGNED or INITIALIZING state. We finally deleted the index and started fresh. But heavy indexing again brought OutOfMemory Errors and node downtime, with same shard issue and data loss. To avoid further downtimes, we stopped all bulk jobs and reindexed data at a very slow rate.
We also added one more node to distribute load. Yet, currently we have 3 nodes with JVM constantly above 75+%, 2 being 80+ always. We have noticed that number of segments and their size is relatively high on these nodes (about 5 GB), but using optimize index on these would risk increasing heap again, with a probability of downtime.
Another important point to note is that our tomcat apps hit only 3 of all nodes (for normal search and indexing), and mostly one of the other two node was used for bulk indexing. Thus, out of three query+indexing receiving node, one node, and the left node for bulk indexing has relatively high heap.
There are following known issues with our configuration and indexing approach, which are planning to fix:
Bulk indexing hits only one node, thus increasing its heap, and causes slightly high GC pauses.
mlockall is set to false
Snapshot is needed to revert index in such cases, we're under planning phase when this incident happened.
We can merge 2 bulk jobs into one, to avoid too indexing request under queue at the same time.
We can use optimize API at regular interval in the bulk indexing script to avoid existence of too many segments.
Elasticsearch yml: (only relevant and enabled settings mentioned)
master: true
index.number_of_shards: 5
index.number_of_replicas: 2
path.conf: /etc/elasticsearch
path.data: /data
transport.tcp.port: 9300
transport.tcp.compress: false
http.port: 9200
http.enabled: true
gateway.type: local
gateway.recover_after_nodes: 2
gateway.recover_after_time: 5m
gateway.expected_nodes: 3
discovery.zen.minimum_master_nodes: 4 # Now that we have 6 nodes
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
Node stats:
Pastebin link
Hot threads:
Pastebin link
If I understand well you have 6 servers, each of one of them is running elasticsearch node.
What I would to is to run more than one node on each server separating the roles, node that act as client, node that acts as data node, and node that act as master. I think that you can have two nodes on each server.
3 servers: data + client
3 servers: data + master
The client nodes and the master nodes will need less amount of RAM. The configuration files will be more complex but it will work better.

ElasticSearch replica and nodes

I am testing now clustering with ElasticSearch and have question about the replicas between the nodes.
As you can see in the screenshot from Head I have 2 indexes.
movies has 5 shards and 2 replica
students has 5 shards and 1 replica
Which one is better and which one is faster with 3 active nodes and why?
Costs of having more number of replicas would be
more storage space required(Obviously)
less indexing performance
while the advantage from it would be
better search performance
better resiliency
Note that even though you have 2 replicas, it does not mean that your cluster can endure 2 nodes going down since all indexing request would fail if only one out of 3 copies of shards is available.(because of indexing quorum)
For detailed explanation please refer to this official document
"Better" is subjective.
With two replicas, you can handle two of the three machines in your cluster going down, though at the price of writing all the data to every machine. Read performance should also be higher as the cluster has more nodes from which to request the data.
With one replica, you can only survive the outage of one machine in your cluster, but you'll get a performance boost by writing 2 copies of the data across 3 servers (less IO on each server).
So it comes down to risk and performance. Hope that helps.

Why is there unequal CPU usage as elasticsearch cluster scales?

I have a 15 node elasticsearch cluster and am indexing a lot of documents. The documents are of the form { "message": "some sentences" }. When I had a 9 node cluster, I could get CPU utilization upto 80% on all of them, when I turned it into a 15 node cluster, i get 90% CPU usage on 4 nodes and only ~50% on the rest.
The specification of the cluster is:
15 Nodes c4.2xlarge EC2 insatnces
15 shards, no replicas
There is load balancer in-front of all the instances and the instances are accessed through the load balancer.
Marvel is running and is used to monitor the cluster
Refresh interval 1s
I could index 50k docs/sec on 9 nodes and only 70k docs/sec on 15 nodes. Shouldn't I be able to do more?
I'm not yet an expert on scalability and load balancing in ES but some things to consider :
load balancing should be native in ES thus having a load balancer in-front can actually mitigate the in-house load balancing results. It's kind of like having a speed limitation on your car but manually using the brakes, it doesn't make that much sense since your speed limitator should already do the job and will be prevented from doing it right when you input "manual regulation". Have you tried not using your load balancer and just using the native load balancing to see how it fares ?
while having more CPU / computation power across different servers / shards, it also forces you to go through multiple shards every time you write/read a document, thus if 1 shard can do N computations, M shards won't actually be able to do M*N computations
having 15 shards is probably overkill in a lot of cases
having 15 shards but no replication is weird/bad since if any of your 15 servers falls, you won't be able to access your whole index
you can actually hold multiple nodes on a single server
What is your index size in terms of storage ?

Resources