How to delete data from a particular shard - elasticsearch

I have got a index with 5 primary shards and no replicas.
One of my shard(shard 1) is in unassigned state. When i checked the log file, i found out below error:
2obv65.nvd, _2vfjgt.fdx, _3e3109.si, _3dwgm5_Lucene45_0.dvm, _3aks2g_Lucene45_0.dvd, _3d9u9f_76.del, _3e30gm.cfs, _3cvkyl_es090_0.tim, _3e309p.nvd, _3cvkyl_es090_0.blm]]; nested: FileNotFoundException[_101a65.si]; ]]
When i checked the index, i could not find the 101a65.si file for the shard 1.
I am unable to locate the missing .si file. I tried a lot but could not assign the shard 1 again.
Is there any other way to make the shard 1 assign again? or do i need to delete the entire shard 1 data?
Please suggest.

Normally in the stack trace you should see the path to the corrupted shard, something like MMapIndexInput(path="path/to/es/db/nodes/node_number/indices/name_of_index/1/index/some_file) (here the 1 is the shard number)
Normally deleting path/to/es/db/nodes/node_number/indices/name_of_index/1 should help the shard recover. If you still see it unassigned try sending this command to your cluster (normally as per the documentation, it should work, though I'm not sure about ES 1.x syntax and commands):
POST _cluster/reroute
{
"commands" : [
{
"allocate" : {
"index" : "myIndexName",
"shard" : 1,
"node" : "myNodeName",
"allow_primary": true
}
}
]
}

Related

Elasticsearch cannot assign shard 0

I'm new to Elastic Search and I'm having an index in red-state due to can't assign shard 0 error.
I found a way to get the explanation but I'm still lost on understanding and fixing it. The server's version is 7.5.2.
curl -XGET 'http://localhost:9200/_cluster/allocation/explain' returns
{
"index":"event_tracking",
"shard":0,
"primary":false,
"current_state":"unassigned",
"unassigned_info":{
"reason":"CLUSTER_RECOVERED",
"at":"2020-12-22T14:51:08.943Z",
"last_allocation_status":"no_attempt"
},
"can_allocate":"no",
"allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions":[
{
"node_id":"cfsLU-nnRTGQG1loc4hdVA",
"node_name":"xxx-clustername",
"transport_address":"127.0.0.1:9300",
"node_attributes":{
"ml.machine_memory":"7992242176",
"xpack.installed":"true",
"ml.max_open_jobs":"20"
},
"node_decision":"no",
"deciders":[
{
"decider":"replica_after_primary_active",
"decision":"NO",
"explanation":"primary shard for this replica is not yet active"
},
{
"decider":"same_shard",
"decision":"NO",
"explanation":"the shard cannot be allocated to the same node on which a copy of the shard already exists [[event_tracking][0], node[cfsLU-nnRTGQG1loc4hdVA], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=TObxz0EFQbylZsyTiIH7SA], unassigned_info[[reason=CLUSTER_RECOVERED], at[2020-12-22T14:51:08.943Z], delayed=false, allocation_status[fetching_shard_data]]]"
},
{
"decider":"throttling",
"decision":"NO",
"explanation":"primary shard for this replica is not yet active"
}
]
}
]
}
I, more or less, understand the error message but I can't find the proper way to fix it. This server is not running on Docker, it's directly installed in the Linux machine.
curl -XGET 'http://localhost:9200/_cat/recovery/event_tracking?v' result
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
event_tracking 0 54.5m existing_store translog n/a n/a 127.0.0.1 xxx-cluster n/a n/a 0 0 100.0% 106 0 0 100.0% 2857898852 7061000 6489585 91.9%
What can I try to resolve this?

Deleting index from ElasticSearch (via Kibana) automatically being recreated?

I have created an ElasticSearch instance via AWS and have pushed some test data into it in order to play around with Kibana. I'm done playing around now and want to delete all my data and start again. I have run a delete command on my index:
Command
DELETE /uniqueindex
Response
{
"acknowledged" : true
}
However almost immediately my index seems to re-appear and documents start appearing in the count of documents as well.
Command
GET /_cat/indices?v
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 e3LQWRvgSvqSL8CFTyw_SA 1 0 3 0 15.2kb 15.2kb
yellow open uniqueindex Y4tlNxAXQVKUs_DjVQLNnA 5 1 713 0 421.7kb 421.7kb
It's like it's auto generating after the delete. Clearly a setting or something, but being new to ElasticSearch/Kibana I'm not sure what I'm missing.
By default indices in Elasticsearch can be created automatically just by PUTing or POSTing a document.
You can change this behavior with action.auto_create_index where you can disable this entirely (indices need to be created with a PUT command) or just whitelist specific indices.
Quoting from the linked docs:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "twitter,index10,-index1*,+ind*"
}
}
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "false"
}
}
+ is allowing automatic index creation while - forbids it.

Finding out on which data path shard is located in Elasticsearch

I have multiple path.datas configured for my Elasticsearch cluster.
The official documentation states that only a single path is used for a single shard, so it's never splitted across multiple paths.
I'd like to find a way to finding out which path on which node is used for some specific shard (primary or replica), like index my-index primary shard 0 → node RQzJvAgLTDOnEnmIjYU9FA path /mnt/data1. Tried /_nodes, /_stats, /_segments, /_shard_stores, but there are no any references to paths.
You can find that info using the indices stats API by specifying the level=shards parameter
GET index/_stats?level=shards
will return a structure like this
"indices": {
"listings-master": {
"primaries": {
...
},
"total": {
...
},
"shards": {
"0": [
{
"shard_path": {
"state_path": "/app/data/nodes/0",
"data_path": "/app/data/nodes/0",
"is_custom_data_path": false
},
...
}
...
Not easily but but by doing a small python script I've the info I want, here the script
import json
with open('shard.json') as json_file:
data = json.load(json_file)
print(data.keys())
data=data['indices']
for indice in data:
#print(indice)
d1=data[indice]
shards=d1['shards']
#print(shards,type(shards),shards.keys())
for nshard in shards.keys():
shard=shards[nshard]
#print(shard,type(shard))
for elt in shard:
path=elt['shard_path']['data_path']
node=elt['routing']['node']
#print(repr(elt['shard_path']['data_path']))
#print("=========================")
print(indice,'\t',nshard,'\t',node,'\t',path)
They you obtain stuff like
log-2020.11.06 1 oxx /datassd/elasticsearch/nodes/0
log-2020.11.06 0 oxx /datassd/elasticsearch/nodes/0
log-2020.11.05 1 oxx /datassd/elasticsearch/nodes/0

Courier Fetch: shards failed

Why do I get these warnings after adding more data to my elasticsearch?
And the warnings are different every time I browse the dashboard.
"Courier Fetch: 30 of 60 shards failed."
More details:
It's a sole node on a CentOS 7.1
/etc/elasticsearch/elasticsearch.yml
index.number_of_shards: 3
index.number_of_replicas: 1
bootstrap.mlockall: true
threadpool.bulk.queue_size: 1000
indices.fielddata.cache.size: 50%
threadpool.index.queue_size: 400
index.refresh_interval: 30s
index.number_of_shards: 5
index.number_of_replicas: 1
/usr/share/elasticsearch/bin/elasticsearch.in.sh
ES_HEAP_SIZE=3G
#I use this Garbage Collector instead of the default one.
JAVA_OPTS="$JAVA_OPTS -XX:+UseG1GC"
cluster status
{
"cluster_name" : "my_cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 61,
"active_shards" : 61,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 61
}
cluster details
{
"cluster_name" : "my_cluster",
"nodes" : {
"some weird number" : {
"name" : "ES 1",
"transport_address" : "inet[localhost/127.0.0.1:9300]",
"host" : "some host",
"ip" : "150.244.58.112",
"version" : "1.4.4",
"build" : "c88f77f",
"http_address" : "inet[localhost/127.0.0.1:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 7854,
"max_file_descriptors" : 65535,
"mlockall" : false
}
}
}
}
I'm curious about the "mlockall" : false because on the yml I did write bootstrap.mlockall: true
logs
lots of lines like:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.search.action.SearchServiceTransportAction$23#a9a34f5
For me tuning the threadpool search queue_size solved the issue. I tried a number of other things and this is the one that solved it.
I added this to my elasticsearch.yml
threadpool.search.queue_size: 10000
and then restarted elasticsearch.
Reasoning... (from the docs)
A node holds several thread pools in order to improve how threads
memory consumption are managed within a node. Many of these pools also
have queues associated with them, which allow pending requests to be
held instead of discarded.
and for search in particular...
For count/search operations. Defaults to fixed with a size of int((#
of available_processors * 3) / 2) + 1, queue_size of 1000.
For more information you can refer to the elasticsearch docs here...
I had trouble finding this information so I hope this helps others!
I got this error when my query was missing a closing quote:
field:"value
In my ElasticSearch logs I see these exceptions:
Caused by: org.elasticsearch.index.query.QueryShardException:
Failed to parse query [field:"value]
...
Caused by: org.apache.lucene.queryparser.classic.ParseException:
Cannot parse 'field:"value': Lexical error at line 1, column 13.
Encountered: <EOF> after : "\"value"
Using Elasticsearch 5.4 thread_pool has an underscore it it.
thread_pool.search.queue_size: 10000
See documentation at Elasticsearch Thread Pool module documentation
This is likely an indication that there's a problem with your cluster's health. Without knowing more about your cluster, there's not much more that can be said.
I agree with #Philip's opinion, But it's necessary to restart elasticsearch at least on Elasticsearch >=1.5.2, because you can dynamically set threadpool.search.queue_size.
curl -XPUT http://your_es:9200/_cluster/settings
{
"transient":{
"threadpool.search.queue_size":10000
}
}
from Elasticsearch >= version 5, its not possible to update cluster settings for thread_pool.search.queue_size using _cluster/settings API. In my case updating ElasticSearch Node yml file is not an option either since if node fails then auto scaling code would bring other ES node with default yml settings.
I have a cluster with 3 nodes and having 400 active primary shards with 7 active threads for queue size of 1000. Increasing number of nodes to 5 with similar config has resolved the issue as queries are getting distributed horizontally to more available nodes.
this will not work on elasticsearch 5.6.
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[colmbmiscxx.xx][172.29.xx.xx:9300][cluster:admin/settings/update]"
}
],
"type": "illegal_argument_exception",
"reason": "transient setting [threadpool.search.queue_size], not dynamically updateable"
},
"status": 400
}

ElasticSearch UNASSIGNED indices fix without data loss

for whatever reason a bunch of indices became UNASSIGNED. I'm looking for a way of assigning them to a cluster node without loosing any data.
I tried using the following API call, but it results in data loss, unfortunately (due to allow_primary):
curl -XPOST 'localhost:9200/_cluster/reroute?pretty' -d '{
"commands" : [ {
"allocate" : {
"index" : "index-name",
"shard" : "0",
"allow_primary" : true,
"node" : "node-name"
}
}
]
}'
I also keep getting the following entries in elasticsearch.log:
[2015-03-16 11:51:12,181][DEBUG][action.search.type ] [cluster node] All shards failed for phase: [query_fetch]
[2015-03-16 11:51:12,450][DEBUG][action.search.type ] [cluster node] All shards failed for phase: [query_fetch]
[2015-03-16 11:51:19,349][DEBUG][action.bulk ] [cluster node] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
[2015-03-16 11:51:20,057][DEBUG][action.bulk ] [cluster node] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
Any help would be appreciated.

Resources