UnavailableShardsException when running tests with 1 shard and 1 node - elasticsearch

We are running our tests (PHP application) in Docker. Some tests use Elasticsearch.
We have configured Elasticsearch to have only 1 node and 1 shard (for simplicity). Here is the config we added to the default:
index.number_of_shards: 1
index.number_of_replicas: 0
Sometimes when the tests run, they fail because of the following Elasticsearch response:
{
"_indices":{
"acme":{
"_shards":{
"total":1,
"successful":0,
"failed":1,
"failures":[
{
"index":"acme",
"shard":0,
"reason":"UnavailableShardsException[[acme][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: delete_by_query {[acme][product], query [{\"query\":{\"term\":{\"product_id\":\"3\"}}}]}]"
}
]
}
}
}
}
The error message extracted from the response:
UnavailableShardsException[[acme][0] Primary shard is not active or isn't assigned to a known node. Timeout: [1m], request: delete_by_query {[acme][product], query [{\"query\":{\"term\":{\"product_id\":\"3\"}}}]}]
Why would our client fail to connect to Elasticsearch's node or shard randomly? Is this something to do with the fact that we have only 1 shard? Is this a bad thing?

Related

Elasticsearch cannot assign shard 0

I'm new to Elastic Search and I'm having an index in red-state due to can't assign shard 0 error.
I found a way to get the explanation but I'm still lost on understanding and fixing it. The server's version is 7.5.2.
curl -XGET 'http://localhost:9200/_cluster/allocation/explain' returns
{
"index":"event_tracking",
"shard":0,
"primary":false,
"current_state":"unassigned",
"unassigned_info":{
"reason":"CLUSTER_RECOVERED",
"at":"2020-12-22T14:51:08.943Z",
"last_allocation_status":"no_attempt"
},
"can_allocate":"no",
"allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions":[
{
"node_id":"cfsLU-nnRTGQG1loc4hdVA",
"node_name":"xxx-clustername",
"transport_address":"127.0.0.1:9300",
"node_attributes":{
"ml.machine_memory":"7992242176",
"xpack.installed":"true",
"ml.max_open_jobs":"20"
},
"node_decision":"no",
"deciders":[
{
"decider":"replica_after_primary_active",
"decision":"NO",
"explanation":"primary shard for this replica is not yet active"
},
{
"decider":"same_shard",
"decision":"NO",
"explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[event_tracking][0], node[cfsLU-nnRTGQG1loc4hdVA], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=TObxz0EFQbylZsyTiIH7SA], unassigned_info[[reason=CLUSTER_RECOVERED], at[2020-12-22T14:51:08.943Z], delayed=false, allocation_status[fetching_shard_data]]]"
},
{
"decider":"throttling",
"decision":"NO",
"explanation":"primary shard for this replica is not yet active"
}
]
}
]
}
I, more or less, understand the error message but I can't find the proper way to fix it. This server is not running on Docker, it's directly installed in the Linux machine.
curl -XGET 'http://localhost:9200/_cat/recovery/event_tracking?v' result
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
event_tracking 0 54.5m existing_store translog n/a n/a 127.0.0.1 xxx-cluster n/a n/a 0 0 100.0% 106 0 0 100.0% 2857898852 7061000 6489585 91.9%
What can I try to resolve this?

Deleting index from ElasticSearch (via Kibana) automatically being recreated?

I have created an ElasticSearch instance via AWS and have pushed some test data into it in order to play around with Kibana. I'm done playing around now and want to delete all my data and start again. I have run a delete command on my index:
Command
DELETE /uniqueindex
Response
{
"acknowledged" : true
}
However almost immediately my index seems to re-appear and documents start appearing in the count of documents as well.
Command
GET /_cat/indices?v
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 e3LQWRvgSvqSL8CFTyw_SA 1 0 3 0 15.2kb 15.2kb
yellow open uniqueindex Y4tlNxAXQVKUs_DjVQLNnA 5 1 713 0 421.7kb 421.7kb
It's like it's auto generating after the delete. Clearly a setting or something, but being new to ElasticSearch/Kibana I'm not sure what I'm missing.
By default indices in Elasticsearch can be created automatically just by PUTing or POSTing a document.
You can change this behavior with action.auto_create_index where you can disable this entirely (indices need to be created with a PUT command) or just whitelist specific indices.
Quoting from the linked docs:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "twitter,index10,-index1*,+ind*"
}
}
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "false"
}
}
+ is allowing automatic index creation while - forbids it.

Elasticsearch restore to a new cluster with differnt number of nodes

I have a ops cluster with 5 nodes (1 master, 1 client, and 3 data nodes). I want to restore a backup of this onto a new test cluster with only 3 nodes (1 master, 1 client, 1 data). I only have 1 data node in my test cluster at the moment and wasn't planning to add any additional data nodes on my test cluster.
The issue I'm having is that when I try to restore to my test cluster, only some of the shards get assigned. Most of them stay in the UNASSIGNED state. I've tried to use the reroute api but it fails. See below
Does my test cluster have to have the same number of nodes as my ops cluster I'm restoring from? If so is there any work around for this?
{
"error": {
"root_cause": [
{
"type": "reroute_transport_exception",
"reason": ["myhost_master"[myhostip:9200][cluster:admin/reroute]"
}
],
"type": "illegal_argument_exception",
"reason": "resovled [myhostip] into [3] nodes, where excpeted to be resolved to a single node"
],
"status": 400
}

How to delete data from a particular shard

I have got a index with 5 primary shards and no replicas.
One of my shard(shard 1) is in unassigned state. When i checked the log file, i found out below error:
2obv65.nvd, _2vfjgt.fdx, _3e3109.si, _3dwgm5_Lucene45_0.dvm, _3aks2g_Lucene45_0.dvd, _3d9u9f_76.del, _3e30gm.cfs, _3cvkyl_es090_0.tim, _3e309p.nvd, _3cvkyl_es090_0.blm]]; nested: FileNotFoundException[_101a65.si]; ]]
When i checked the index, i could not find the 101a65.si file for the shard 1.
I am unable to locate the missing .si file. I tried a lot but could not assign the shard 1 again.
Is there any other way to make the shard 1 assign again? or do i need to delete the entire shard 1 data?
Please suggest.
Normally in the stack trace you should see the path to the corrupted shard, something like MMapIndexInput(path="path/to/es/db/nodes/node_number/indices/name_of_index/1/index/some_file) (here the 1 is the shard number)
Normally deleting path/to/es/db/nodes/node_number/indices/name_of_index/1 should help the shard recover. If you still see it unassigned try sending this command to your cluster (normally as per the documentation, it should work, though I'm not sure about ES 1.x syntax and commands):
POST _cluster/reroute
{
"commands" : [
{
"allocate" : {
"index" : "myIndexName",
"shard" : 1,
"node" : "myNodeName",
"allow_primary": true
}
}
]
}

ElasticSearch UNASSIGNED indices fix without data loss

for whatever reason a bunch of indices became UNASSIGNED. I'm looking for a way of assigning them to a cluster node without loosing any data.
I tried using the following API call, but it results in data loss, unfortunately (due to allow_primary):
curl -XPOST 'localhost:9200/_cluster/reroute?pretty' -d '{
"commands" : [ {
"allocate" : {
"index" : "index-name",
"shard" : "0",
"allow_primary" : true,
"node" : "node-name"
}
}
]
}'
I also keep getting the following entries in elasticsearch.log:
[2015-03-16 11:51:12,181][DEBUG][action.search.type ] [cluster node] All shards failed for phase: [query_fetch]
[2015-03-16 11:51:12,450][DEBUG][action.search.type ] [cluster node] All shards failed for phase: [query_fetch]
[2015-03-16 11:51:19,349][DEBUG][action.bulk ] [cluster node] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
[2015-03-16 11:51:20,057][DEBUG][action.bulk ] [cluster node] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
Any help would be appreciated.

Resources