Elasticsearch replica shard recovery failed, yellow status - elasticsearch

During an ElasticSearch recovery process a primary shard was lost and a replica shard was promoted but after that, when cluster tried to recover this shard
we got the following exception repeateadly:
RecoverFilesRecoveryException[[my-index][0] Failed to transfer [279] files with total size of [12.6gb]]; nested: RemoteTransportException[File corruption occured on recovery but checksums are ok]; ]]
In addition to that, the recovery of this shard response was:
{
"id" : 0,
"type" : "REPLICA",
"stage" : "INDEX",
"primary" : false,
...
"index" : {
"files" : {
"total" : 279,
"reused" : 0,
"recovered" : 262,
"percent" : "93.9%"
},
"bytes" : {
"total" : 13630355592,
"reused" : 0,
"recovered" : 7036450677,
"percent" : "51.6%"
},
And after 2 minutes
"index" : {
"files" : {
"total" : 279,
"reused" : 0,
"recovered" : 276,
"percent" : "98.9%"
},
"bytes" : {
"total" : 13630355592,
"reused" : 0,
"recovered" : 10500690274,
"percent" : "77.0%"
}
The above numbers where up/down for quite some time, but it never really recovered and cluster status remained yellow!
Is there a way to make cluster green again? Perhaps delete this replica somehow?
A dirty solution was to delete the index of the above shard and reindex again.
ES version: 1.3.1

Related

Elasticsearch cluster dropping messages

In my cluster trying to insert message (from filebeat) I get
(status=400): {"type":"illegal_argument_exception","reason":"Validation Failed: 1: this action would add [2] shards, but this cluster currently has [2999]/[3000] maximum normal shards open;"}, dropping event!
looking at the cluster health its looks like it has available shards
"cluster_name" : "elastic",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1511,
"active_shards" : 1511,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1488,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1159,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 57267,
"active_shards_percent_as_number" : 50.3834611537179
any ideas ?
Tldr;
It seems like you only have one data node.
As you can see you have 1511 allocated shards, and 1488 unassigned shards
And guess what happens if we add them up ^^.
We get the 2999.
Elasticsearch does not allow to have the primary shards and its replicas on the same node.
But since you have only one data node all those unassigned shards are most likely replicas.
Solution
short term
Create your new index with 0 replica.
POST /my-index-000001
{
"settings": {
"index" : {
"number_of_replicas" : 0
}
}
}
You could also set all other index with no replica via:
PUT /*/_settings
{
"index" : {
"number_of_replicas" : 0
}
}
Long term
You either need to change all you indices to only have one replica or add another node. So that those replicas get allocated.
At the moment you are risking data loss of the node crash.
Then as per the documentation you can change the value of cluster.routing.allocation.total_shards_per_node which must be set a 3000

Unexpected Disk Capacity issue using Elasticsearch cluster on EKS with a 8Exabytes EFS disk

I have configured an elasticsearch cluster in my kubernetes cluster (EKS), the elasticsearch cluster has 3 nodes and I have set up a 8E disk for the nodes to store the data. (thinking that I wont have any space issues for a while...)
[root#es-cluster-0 elasticsearch]# curl -s -XGET http://localhost:9200/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
36 66.7gb 966.1gb 8191.9pb 8191.9pb 0 10.65.32.184 10.65.32.184 es-cluster-0
33 82.6gb 966.1gb 8191.9pb 8191.9pb 0 10.65.32.202 10.65.32.202 es-cluster-2
37 76gb 966.1gb 8191.9pb 8191.9pb 0 10.65.32.178 10.65.32.178 es-cluster-1
14 UNASSIGNED
The cluster current health is:
[root#es-cluster-0 elasticsearch]# curl -s -XGET http://localhost:9200/_cluster/health?pretty
{
"cluster_name" : "k8s-logs",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 56,
"active_shards" : 106,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 14,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 88.33333333333333
}
I can see that I have 14 "unassigned_shards" which matches perfectly with the last line of the /_cat/allocation above
When I'm start figuring out what is happening I found this:
[root#es-cluster-0 elasticsearch]# curl -s -XGET http://localhost:9200/_cluster/allocation/explain?pretty
{
"index" : "logstash-2022.01.22",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2022-01-22T00:00:11.254Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [bf_GjmcUQGuCTk-_voh4Xw]: failed recovery, failure RecoveryFailedException[[logstash-2022.01.22][0]: Recovery failed from {es-cluster-0}{hYJ4ifx7R7yWJq6VFP3Drw}{jjAAtdcmQXeVpJXxj4DYcA}{10.65.32.184}{10.65.32.184:9300}{dilmrt}{ml.machine_memory=15878057984, ml.max_open_jobs=20, xpack.installed=true, transform.node=true} into {es-cluster-1}{bf_GjmcUQGuCTk-_voh4Xw}{QNp4DD51TQa716D4TjMFPg}{10.65.32.178}{10.65.32.178:9300}{dilmrt}{ml.machine_memory=15878057984, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[es-cluster-0][10.65.32.184:9300][internal:index/shard/recovery/start_recovery]]; nested: RemoteTransportException[[es-cluster-1][10.65.32.178:9300][internal:index/shard/recovery/clean_files]]; nested: UncategorizedExecutionException[Failed execution]; nested: NotSerializableExceptionWrapper[execution_exception: java.io.IOException: Disk quota exceeded]; nested: IOException[Disk quota exceeded]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "7WHft5LVTYCEWvwKM64A-w",
"node_name" : "es-cluster-2",
"transport_address" : "10.65.32.202:9300",
"node_attributes" : {
"ml.machine_memory" : "15878057984",
"ml.max_open_jobs" : "20",
"xpack.installed" : "true",
"transform.node" : "true"
},
--- TRUNCATED ---
I don't know why it's saying Disk quota exceeded if the elasticsearch cluster is reporting correctly the capacity that it has available /_cat/allocation is there any additional configuration that I need to setup in order to tell elasticsearch that we have enough space to work with ?
See here for EFS limitations that can cause disk quota error which is not necessary disk size related. Generally EFS does not support a sizeable ES stack, example elasticsearch expects 64K file descriptors per data node instance but EFS only support 32K at the moment. If you look into your elasticsearch logs there could be clue about which limitation has breached.

Elasticsearch Searchguard unassigned shards

i've ran into the issue with unassigned searchguard shards when I've added new nodes to the ElasticSearch cluster. Cluster is located in public-cloud and has enabled awareness setting with node.awareness.attributes: availability_zone. Searchguard has enabled replica count auto-expand enabled by default. Problem reoccurs when I have three nodes in one zone and by one in two other zones:
eu-central-1a = 3 nodes
eu-central-1b = 1 node
eu-central-1c = 1 node
I do understand this is cluster configuration is kinda imbalanced, this is just replay of production issue. I just want to understand the logic of elasticsearch and searchguard. Why it is causing such issue. So here is my config
{
"cluster_name" : "test-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 1032,
"active_shards" : 3096,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.96771068776235
}
indices
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open searchguard GuL6pHCUTUKbmygbIsLAYw 1 4 5 0 131.3kb 35.6kb
explanation
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[searchguard][0], node[a59ptCI2SfifBWmnmRoqxA], [R], s[STARTED], a[id=d3rMAN8xQi2xrTD3y_SUPA]]"
},
{
"decider" : "awareness",
"decision" : "NO",
"explanation" : "there are too many copies of the shard allocated to nodes with attribute [aws_availability_zone], there are [5] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [3] to be less than or equal to the upper bound of the required number of shards per attribute [2]"
}
]
searchguard config
{
"searchguard" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"auto_expand_replicas" : "0-all",
"provided_name" : "searchguard",
"creation_date" : "1554095156112",
"number_of_replicas" : "4",
"uuid" : "GuL6pHCUTUKbmygbIsLAYw",
"version" : {
"created" : "6020499"
}
}
}
}
}
questions I have:
searchguard config said "number_of_replicas" : "4", but allocator explanations said there are [5] total configured shard copies so 5 is this with primary replica? Even if so...
what is the problem to put all these shards(3) to one zone (eu-central-1a) even if zone collapsed we would have two replicas in other zones, isn't it enough to recover?
how elasticsearch calculates these conditionals required number of shards per attribute [2]. Considering this limitation I can raise only up to 2*zones_count (2*3 = 6) for my cluster. This is really not much. Looks like there should be ways to overcome this limit.

elasticsearch: How to interpret log file (cluster went to yellow status)?

Elasticsearch 1.7.2 on CentOS, 8GB RAM, 2 node cluster.
We posted the whole log here: http://pastebin.com/zc2iG2q4
When we look at /_cluster/health , we see 2 unassigned shards:
{
"cluster_name" : "elasticsearch-prod",
"status" : "yellow", <--------------------------
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 5,
"active_shards" : 8,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2, <--------------------------
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
In the log, we see:
marking and sending shard failed due to [failed to create shard]
java.lang.OutOfMemoryError: Java heap space
And other errors.
The only memory related config value we have is:
indices.fielddata.cache.size: 75%
We are looking to:
understand the log more completely
understand what action we need to take to address the situation now (recover) and prevent it in the future
Additional details:
1) ES_HEAP_SIZE is stock, no changes. (Further, looking around, it is not clear where best to change it.... /etc/init.d/elasticsearch ?)
2) Our jvm stats are below. (And please note, as a test, I modded "/etc/init.d/elasticsearch" and and added export ES_HEAP_SIZE=4g [in place of the existing "export ES_HEAP_SIZE" line] and restarted ES.... Comparing two identical nodes, one with the changed elasticsearch file, and one stock, the values below appear identical)
"jvm" : {
"timestamp" : 1448395039780,
"uptime_in_millis" : 228297,
"mem" : {
"heap_used_in_bytes" : 81418872,
"heap_used_percent" : 7,
"heap_committed_in_bytes" : 259522560,
"heap_max_in_bytes" : 1037959168,
"non_heap_used_in_bytes" : 50733680,
"non_heap_committed_in_bytes" : 51470336,
"pools" : {
"young" : {
"used_in_bytes" : 52283368,
"max_in_bytes" : 286326784,
"peak_used_in_bytes" : 71630848,
"peak_max_in_bytes" : 286326784
},
"survivor" : {
"used_in_bytes" : 2726824,
"max_in_bytes" : 35782656,
"peak_used_in_bytes" : 8912896,
"peak_max_in_bytes" : 35782656
},
"old" : {
"used_in_bytes" : 26408680,
"max_in_bytes" : 715849728,
"peak_used_in_bytes" : 26408680,
"peak_max_in_bytes" : 715849728
}
}
},
"threads" : {
"count" : 81,
"peak_count" : 81
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 250,
"collection_time_in_millis" : 477
},
"old" : {
"collection_count" : 1,
"collection_time_in_millis" : 22
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 112,
"used_in_bytes" : 20205138,
"total_capacity_in_bytes" : 20205138
},
"mapped" : {
"count" : 0,
"used_in_bytes" : 0,
"total_capacity_in_bytes" : 0
}
}
},
Solved.
The key here is the error "java.lang.OutOfMemoryError: Java heap space"
Another day, another gem from the ES docs:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
says (emphasis mine):
The default installation of Elasticsearch is configured with a 1 GB heap. For just about every deployment, this number is far too small. If you are using the default heap values, your cluster is probably configured incorrectly.
Resolution:
Edit: /etc/sysconfig/elasticsearch
Set ES_HEAP_SIZE=4g // this system has 8GB RAM
Restart ES
And tada.... the unassigned shards are magically assigned, and the cluster goes green.

Too many open files warning from elasticsearch

Getting the below warning messages continuously. Not sure what should be done. Saw some of the relevant posts asking to increase the number of file descriptors.
How to do the same?
Even if I increase now, Will I encounter the same issue on addition of new indices.
(presently working with around 400 indices, 6 shards and 1 replica). The number of indices tend to grow more.
[03:58:24,165][WARN ][cluster.action.shard ] [node1] received shard failed for [index9][2], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index9][2] failed recovery]; nested: EngineCreationFailureException[[index9][2] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index9/2/index/segments_1 (Too many open files)]; ]]
[03:58:24,166][WARN ][cluster.action.shard ] [node1] received shard failed for [index15][0], node[node_hash2], [P], s[INITIALIZING], reason [Failed to create shard, message [IndexShardCreationException[[index15][0] failed to create shard]; nested: IOException[directory '/data/elasticsearch/whatever/nodes/0/indices/index15/0/index' exists and is a directory, but cannot be listed: list() returned null]; ]]
[03:58:24,195][WARN ][cluster.action.shard ] [node1] received shard failed for [index16][3], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index16][3] failed recovery]; nested: EngineCreationFailureException[[index16][3] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index16/3/index/segments_1 (Too many open files)]; ]]
[03:58:24,196][WARN ][cluster.action.shard ] [node1] received shard failed for [index17][0], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index17][0] failed recovery]; nested: EngineCreationFailureException[[index17][0] failed to open reader on writer]; nested: FileNotFoundException[/data/elasticsearch/whatever/nodes/0/indices/index17/0/index/segments_1 (Too many open files)]; ]]
[03:58:24,198][WARN ][cluster.action.shard ] [node1] received shard failed for [index21][4], node[node_hash3], [P], s[INITIALIZING], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index21][4] failed recovery]; nested: EngineCreationFailureException[[index21][4] failed to create engine]; nested: LockReleaseFailedException[Cannot forcefully unlock a NativeFSLock which is held by another indexer component: /data/elasticsearch/whatever/nodes/0/indices/index21/4/index/write.lock]; ]]
Output of nodes api
curl -XGET 'http://localhost:9200/_nodes?os=true&process=true&pretty=true'
{
"ok" : true,
"cluster_name" : "whatever",
"nodes" : {
"node_hash1" : {
"name" : "node1",
"transport_address" : "transportip1",
"hostname" : "myhostip1",
"version" : "0.20.4",
"http_address" : "httpip1",
"attributes" : {
"data" : "false",
"master" : "true"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 8,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2133,
"total_cores" : 8,
"total_sockets" : 8,
"cores_per_socket" : 16,
"cache_size" : "4kb",
"cache_size_in_bytes" : 4096
},
"mem" : {
"total" : "7gb",
"total_in_bytes" : 7516336128
},
"swap" : {
"total" : "30gb",
"total_in_bytes" : 32218378240
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 26188,
"max_file_descriptors" : 16384
}
},
"node_hash2" : {
"name" : "node2",
"transport_address" : "transportip2",
"hostname" : "myhostip2",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2400,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 32,
"cache_size" : "20kb",
"cache_size_in_bytes" : 20480
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 24883,
"max_file_descriptors" : 16384
}
},
"node_hash3" : {
"name" : "node3",
"transport_address" : "transportip3",
"hostname" : "myhostip3",
"version" : "0.20.4",
"attributes" : {
"master" : "false"
},
"os" : {
"refresh_interval" : 1000,
"available_processors" : 4,
"cpu" : {
"vendor" : "Intel",
"model" : "Xeon",
"mhz" : 2666,
"total_cores" : 4,
"total_sockets" : 4,
"cores_per_socket" : 16,
"cache_size" : "8kb",
"cache_size_in_bytes" : 8192
},
"mem" : {
"total" : "34.1gb",
"total_in_bytes" : 36700303360
},
"swap" : {
"total" : "0b",
"total_in_bytes" : 0
}
},
"process" : {
"refresh_interval" : 1000,
"id" : 25328,
"max_file_descriptors" : 16384
}
}
}
How to increase the maximum number of allowed open files depends slightly on your linux distribution. Here are some instructions for ubuntu and centos:
http://posidev.com/blog/2009/06/04/set-ulimit-parameters-on-ubuntu/
http://pro.benjaminste.in/post/318453669/increase-the-number-of-file-descriptors-on-centos-and
The elasticsearch documentation recommends setting the maximum file limit to 32k or 64k. Since you are at 16k and are already hitting a limit, I'd probably set it higher; something like 128k. See: http://www.elasticsearch.org/guide/reference/setup/installation/
After upping the number of open files and restarting elasticsearch you will want to verify that it worked by re-running the curl command you mentioned:
curl -XGET 'http://localhost:9200/_nodes?os=true&process=true&pretty=true'
As you add more indices (along with more documents), you will also see the number of files elasticsearch keeps track of increase. If you notice performance degradation with all of the indicies and documents, you can try adding a new node to your cluster: http://www.elasticsearch.org/guide/reference/setup/configuration/ - since you already have a sharded, replicated configuration, this should be a relatively painless process.
Stop ElasticSearch. if you start from command like
(bin/elasticsearch) then please specific this to set up heap while
starting. For ex, I use a 16GB box so my command is
a. bin/elasticsearch -Xmx8g -Xms8g
b. Go to config (elasticsearch/config/elasticsearch.yml) and ensure that
bootstrap.mlockall: true
c. Increase ulimits -Hn and ulimits -Sn to more than 200000
If you start as a service, then do the following
a. export ES_HEAP_SIZE=10g
b. Go to config (/etc/elasticsearch/elasticsearch.yml) and ensure that
bootstrap.mlockall: true
c. Increase ulimits -Hn and ulimits -Sn to more than 200000
Make sure that the size you enter is not more than 50% of the heap whether you start it as a service or from command line
Note that changing ulimits via /etc/security/limits.conf won't have any effect if elasticsearch is a systemd service.
To increase the elasticsearch limit under systemd create a file /etc/systemd/system/elasticsearch.service.d/override.conf with the following content:
[Service]
LimitNOFILE=infinity
Then run systemctl daemon-reload && systemctl restart elasticsearch.

Resources