GC problems in ES 1.4.4 - elasticsearch

Frequent full GC even if the memory utilization is 30-40%. We have not changed the default JVM settings
Cluster details
A master node
Two data nodes
All the replicas are allocated on one node and all the primaries are on the other node. It should not be a problem, right? Also frequent full GC is observed on one of the 2 data nodes.
Shards -
We have around 75 shards with replication as 1. I guess the problem is because of shard over allocation.
Most of the queries we hit are aggregation queries.

Related

How do I control where my primary and replica shards are located?

So I have two zoness: fast and slow.
I'm looking to ensure the primary shards are in the fast nodes and the replicas are in the slow nodes.
I have the following cluster settings:
cluster.routing.allocation.awareness.attributes: zone
cluster.routing.allocation.awareness.force.zone.values: fast,slow
If I add the following at index level:
"index.routing.allocation.require.zone": "fast"
It will allocate ONLY the primaries to fast. I could then be cheeky and do the following:
"index.routing.allocation.require.zone": "fast,slow"
and it would allocate the replicas to slow...as desired.
However, if I stop the fast nodes, it will allocate all the primary shared to slow nodes. When fast nodes come back online, they are allocated as replicas.
So the question is. How can I ensure that fast nodes are marked as primary when they are online for a given index?
while your actions might temporarily get what you want, you can't guarantee this will always happen
allocation is done on an index level irrespective of it being a primary or a replica shard

Elastic search shard allocation

What will happen when a node which was moved out of the cluster rejoins the cluster after the cluster re balance?
suppose i have cluster of 5 nodes and its status is green and out of 5 one node leaves the cluster and i have configured delayed shard allocation. After the shard time out of the delayed allocation the master promotes the one of the replica as a primary and allocates the unassigned shards and re balances the cluster. What will happen when a the node which has moved out of the cluster rejoins the cluster after all the re balancing of the cluster has been done? what about the shards present in the node which has rejoined the cluster?
Every Node in your cluster computes a weight based on the amount of shards it holds. You can tune the point at which a rebalancing happens via Cluster Level Shard Allocation Settings. Take note that the important information is the amount of shards, not their size. If your shards are very different in size, this may lead to problems with balancing.
When a Node rejoins the cluster, currently ongoing recovery processes (copying the shard to another node to regain green status, amount limited by cluster.routing.allocation.node_concurrent_recoveries) will complete, then the next shard will be recovered. When the cluster discovers an up-to-date version of the missing shard on your rejoined node it will save that information to cluster state and not take any unneccesary action.

how does elasticsearch decide how many shards can be restored in parallel

I made a snapshot of an index with 24 shards. The index is of size 700g.
When restoring, it restores 4 shards in parallel.
My cluster is a new cluster with only one machine w/o replica nodes.
The machine is AWS c3.8xlarge with 32 vCPUs and 60G memory.
I also followed How to speed up Elasticsearch recovery?.
When restoring, the memory usage is full. How does elastic search decide how many shards can be restored in parallel?
I was wondering how I can tune my machines' hardware config to make restore faster. If my cluster has more machines, can the restoring speed be improved linearly?
Basically, for ES 6.x there are two settings that decide how fast is the recovery for primary shards:
cluster.routing.allocation.node_initial_primaries_recoveries Sets the number of primary shards that are recovering in parallel on one node. Defaults is 4. So, for a cluster with N machines, the total number of recovering shards in parallel is N*node_initial_primaries_recoveries (See https://www.elastic.co/guide/en/elasticsearch/reference/current/shards-allocation.html#_shard_allocation_settings)
indices.recovery.max_bytes_per_sec Decides how much storage is loaded on recovery per single index. Default is 40mb. (See https://www.elastic.co/guide/en/elasticsearch/reference/current/recovery.html)

One node in Elasticsearch cluster permanently overloaded

I have an Elasticsearch cluster with 11 nodes. Five of these are data nodes and the other ones are client nodes from where I add and retrieve documents.
I am using the standard Elasticsearch configuration. Each index has 5 shards and replicas. In the cluster I have 55 indices and round about 150GB of data.
The cluster is very slow. With the Kopf plugin I can see the stats of each node. There I can see that one single data node (not the master) is permanently overloaded. Heap, disk, cpu are ok, but load is almost every time 100%. I have noticed, that every shard is a primary shard whereas all other data nodes have both primary shards and replicas. When I shutdown that node and then on again, the same problem occurs at another data node.
And I don't know why and how to solve this problem. I thought that the client nodes and the master node distribute the requests evenly? Why is always one data node overloaded?
Try the following settings:
cluster.routing.rebalance.enable:
Enable or disable rebalancing for specific kinds of shards:
all - (default) Allows shard balancing for all kinds of shards.
primaries - Allows shard balancing only for primary shards.
replicas - Allows shard balancing only for replica shards.
none - No shard balancing of any kind are allowed for any indices.
cluster.routing.allocation.allow_rebalance:
Specify when shard rebalancing is allowed:
always - Always allow rebalancing.
indices_primaries_active - Only when all primaries in the cluster are allocated.
indices_all_active - (default) Only when all shards (primaries and replicas) in the cluster are allocated.
cluster.routing.allocation.cluster_concurrent_rebalance:
Allow to control how many concurrent shard rebalances are allowed cluster wide.
Defaults to 2
Sample curl to apply desired settings:
curl -XPUT <elasticsearchserver>:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.rebalance.enable" : "all"
}
}
You can replace transient with persistent if you want your settings persist across restarts.

How to max out CPU cores on Elasticsearch cluster

How many shards and replicas do I have to set to use every cpu core (I want 100% load, fastest query results) in my cluster?
I want to use Elasticsearch for aggregations. I read that Elasticsearch uses multiple cpu cores, but found no exact details about cpu cores regarding sharding and replicas.
My observations are, that a single shard does not use more than 1 core/thread at query time (considerung there is only one query at a time). With replicas, the query of a 1-shard index are not faster, since Elasticsearch does not seem to use the other nodes to distribute the load on a shard.
My questions (one query at a time):
A shard does not use more than one cpu core?
Shards must always be scanned completely, replicas cannot be used to divide intra-shard load between nodes?
The formular for best performance is SUM(CPU_CORES per node) * PRIMARY_SHARDS?
When doing an operation (indexing, searching, bulk indexing etc) a shard on a node uses one thread of execution, meaning one CPU core.
If you have one query running at a given moment, that will use one CPU core per shard. For example, a three node cluster with a single index that has 6 primary shards and one replica, will have in total 12 shards, 4 shards on each node.
If there is only one query running on the cluster, for that index, ES will query all the 6 shards of the index (no matter if they are primaries or replicas) and each node will use between 0 and 4 CPU cores for the job, because the round-robin algorithm used by ES to choose which copy of a shard performs the search can choose no shards on one node or maximum 4 shards on one node.

Resources