Remove unassigned Graylog2 index/shards with Bash in a loop - bash

There were some disk issues on a Graylog2 server I use for debug logs. There are unassigned shards now:
curl -XGET http://host:9200/_cat/shards
graylog_292 1 p STARTED 751733 648.4mb 127.0.1.1 Doctor Leery
graylog_292 1 r UNASSIGNED
graylog_292 2 p STARTED 756663 653.2mb 127.0.1.1 Doctor Leery
graylog_292 2 r UNASSIGNED
graylog_290 0 p STARTED 299059 257.2mb 127.0.1.1 Doctor Leery
graylog_290 0 r UNASSIGNED
graylog_290 3 p STARTED 298759 257.1mb 127.0.1.1 Doctor Leery
graylog_290 3 r UNASSIGNED
graylog_290 1 p STARTED 298314 257.3mb 127.0.1.1 Doctor Leery
graylog_290 1 r UNASSIGNED
graylog_290 2 p STARTED 297722 257.1mb 127.0.1.1 Doctor Leery
graylog_290 2 r UNASSIGNED
....
It's over 400 shards. I can delete them without data loss, because it's a single node setup. In order to do this I need to loop over the index (graylog_xxx) and over the shard (1,2,...).
How do I loop over this (2 variables) with Bash? There are 2 variables for the deletion API call, which I need to replace (afaik):
curl -XPOST 'host:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "$index",
"shard" : $shard,
"node" : "Doctor Leery",
"allow_primary" : true
}
}
]
}'
What also bothers me about this is, that the unassigned shards have no node. But in the API call I need to specify one.

Form the _cat/shards output you shared, it simply looks like those are unassigned replicas, which you can simply remove by updating the cluster settings and setting the replica count to 0, like this:
curl -XPUT 'localhost:9200/_settings' -d '{
"index" : {
"number_of_replicas" : 0
}
}'
After running the above curl, your cluster will be green again.

Related

Debugging Elasticsearch and tuning on small server, single node

I am posting a more general question, after having found I may have more issues than low disk space:
optimise server operations with elasticsearch : addressing low disk watermarks and authentication failures
My issue is that my ES server crashes occasionally, and cannot figure out why.
I want to ensure reliability at least of days, and if error occur, restart the instance automatically.
Which best practices could I follow to debug ES on a small server instance, using a single node?
This is what I am looking at:
(useful resource - https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/)
Check on available disk space - optimise server operations with elasticsearch : addressing low disk watermarks and authentication failures
Check on ES log (/var/log/elasticsearch):
...
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:351) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:373) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:359) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:651) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:536) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:490) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:450) [netty-transport-4.1.6.Final.jar:4.1.6.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.6.Final.jar:4.1.6.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: org.elasticsearch.action.NoShardAvailableActionException
... 60 more
[2020-05-12T15:05:56,874][INFO ][o.e.c.r.a.AllocationService] [awesome3-master] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[en-awesome-wiki][2]] ...]).
[2020-05-12T15:10:48,998][DEBUG][o.e.a.a.c.a.TransportClusterAllocationExplainAction] [awesome3-master] explaining the allocation for [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false], found shard [[target-validation][4], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2020-05-12T15:05:54.260Z], delayed=false, allocation_status[no_attempt]]]
I spotted somewhere a shared allocation error. So I check:
curl -s 'localhost:9200/_cat/allocation?v'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
15 616.2mb 10.6gb 12.5gb 23.1gb 45 127.0.0.1 127.0.0.1 awesome3-master
15 UNASSIGNED
What does this mean ? Are the indexed duplicated in more replicas (see below) ?
I check
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1425 100 1425 0 0 5137 0 --:--:-- --:--:-- --:--:-- 5144
target-validation 4 r UNASSIGNED CLUSTER_RECOVERED
target-validation 2 r UNASSIGNED CLUSTER_RECOVERED
target-validation 1 r UNASSIGNED CLUSTER_RECOVERED
target-validation 3 r UNASSIGNED CLUSTER_RECOVERED
target-validation 0 r UNASSIGNED CLUSTER_RECOVERED
it-tastediscovery-expo 4 r UNASSIGNED CLUSTER_RECOVERED
it-tastediscovery-expo 2 r UNASSIGNED CLUSTER_RECOVERED
it-tastediscovery-expo 1 r UNASSIGNED CLUSTER_RECOVERED
it-tastediscovery-expo 3 r UNASSIGNED CLUSTER_RECOVERED
it-tastediscovery-expo 0 r UNASSIGNED CLUSTER_RECOVERED
en-awesome-wiki 4 r UNASSIGNED CLUSTER_RECOVERED
en-awesome-wiki 2 r UNASSIGNED CLUSTER_RECOVERED
en-awesome-wiki 1 r UNASSIGNED CLUSTER_RECOVERED
en-awesome-wiki 3 r UNASSIGNED CLUSTER_RECOVERED
en-awesome-wiki 0 r UNASSIGNED CLUSTER_RECOVERED
and here I have a question: is ES trying to create new replicas each time an error is failing the system ?
So I look at an explaination:
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
{
"shard" : {
"index" : "target-validation",
"index_uuid" : "ONFPE7UQQzWjrhG0ztlSdw",
"id" : 4,
"primary" : false
},
"assigned" : false,
"shard_state_fetch_pending" : false,
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2020-05-12T15:05:54.260Z",
"delayed" : false,
"allocation_status" : "no_attempt"
},
"allocation_delay_in_millis" : 60000,
"remaining_delay_in_millis" : 0,
"nodes" : {
"Ynm6YG-MQyevaDqT2n9OeA" : {
"node_name" : "awesome3-master",
"node_attributes" : { },
"store" : {
"shard_copy" : "AVAILABLE"
},
"final_decision" : "NO",
"final_explanation" : "the shard cannot be assigned because allocation deciders return a NO decision",
"weight" : 9.5,
"decisions" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Ynm6YG-MQyevaDqT2n9OeA] on which it already exists"
}
]
}
}
}
Now, I would like to better understand what a shard is and what ES is attempting to do.
Should I delete unused replicas?
And finally, what should I do to test the service is "sufficiently" reliable ?
Kindly let me know if there are best practices to follow for debugging ES and tuning server.
My constraint are a small server and would be happy if server won't crash, just take a little bit longer.
EDIT
Found this very useful question :
Shards and replicas in Elasticsearch
and this answer may offer a solution:
https://stackoverflow.com/a/50641899/305883
Before testing it out as an answer, could you kindly help to figure out if / how back-up the indexes and estimating correct parameters?
I run 1 single server and assume, given the above configurations, number_of_shards should be 1 (1 single machine) and max number_of_replicas could be 2 (disk size should handle it) :
curl -XPUT 'localhost:9200/sampleindex?pretty' -H 'Content-Type: application/json' -d '
{
"settings":{
"number_of_shards":1,
"number_of_replicas":2
}
}'

Delete unused data elasticsearch

I'm new in using elasticseach. I to use elasticsearch to aggregate logs. My problem is with the storage, I deleted all indices and now I have only one index.
When I call /_cat/allocation?v disk.indices is 23.9mb and disk.used is 16.4gb. Why is this difference? How can I remove unused data or how can I remove properly indices?
I ran the command:
curl -XPOST "elasticsearch:9200/_forcemerge?only_expunge_deletes=true"
But I didn't see any improvement.
Output of _cat/allocation?v :
shards disk.indices disk.used disk.avail
12 24.3mb 16.4gb 22.7gb
Output of _cat/shards?v :
index shard prirep state docs store ip node
articles 0 p STARTED 3666 24.2mb 192.168.1.21 lW9hsd5
articles 0 r UNASSIGNED
storage_test 2 p STARTED 0 261b 192.168.1.21 lW9hsd5
storage_test 2 r UNASSIGNED
storage_test 3 p STARTED 0 261b 192.168.1.21 lW9hsd5
storage_test 3 r UNASSIGNED
storage_test 4 p STARTED 0 261b 192.168.1.21 lW9hsd5
storage_test 4 r UNASSIGNED
storage_test 1 p STARTED 0 261b 192.168.1.21 lW9hsd5
storage_test 1 r UNASSIGNED
storage_test 0 p STARTED 0 261b 192.168.1.21 lW9hsd5
storage_test 0 r UNASSIGNED
twitter 3 p STARTED 1 4.4kb 192.168.1.21 lW9hsd5
twitter 3 r UNASSIGNED
twitter 2 p STARTED 0 261b 192.168.1.21 lW9hsd5
twitter 2 r UNASSIGNED
twitter 4 p STARTED 0 261b 192.168.1.21 lW9hsd5
twitter 4 r UNASSIGNED
twitter 1 p STARTED 0 261b 192.168.1.21 lW9hsd5
twitter 1 r UNASSIGNED
twitter 0 p STARTED 0 261b 192.168.1.21 lW9hsd5
twitter 0 r UNASSIGNED
.kibana 0 p STARTED 4 26.4kb 192.168.1.21 lW9hsd5
Thanks
https://www.elastic.co/guide/en/elasticsearch/guide/current/delete-doc.html
As already mentioned in Updating a Whole Document, deleting a document
doesn’t immediately remove the document from disk; it just marks it as
deleted. Elasticsearch will clean up deleted documents in the
background as you continue to index more data.
You might be facing some side effects of a _forcemerge on an a non-read-only index:
Warning: Force merge should only be called against read-only indices. Running force merge against a read-write index can cause very large segments to be produced (>5Gb per segment), and the merge policy will never consider it for merging again until it mostly consists of deleted docs. This can cause very large segments to remain in the shards.
In this case I would suggest to first make the index read-only:
PUT your_index/_settings
{
"index": {
"blocks.read_only": true
}
}
Then to do force merge again and enable back writing to the index:
PUT your_index/_settings
{
"index": {
"blocks.read_only": false
}
}
In case this does not work, you can do a reindex from an old index into a new index and then delete the old index.
Is there a better way of deleting old logs?
Looks like you want to delete old log messages. Although you could issue a delete by query, there is in fact a better way: using Rollover API.
The idea is to create a new index every time the old index gets too big. Writes will happen into a fixed alias, and Rollover API will make alias point into a new index when the old one is too old or too big. Then to delete the old data you will only have to delete the old indexes.
Hope that helps!

How to delete unassigned shards in elasticsearch?

I have only one node on one computer and the index have 5 shards without replicas. Here are some parameters describe my elasticsearch node(healthy indexes are ignored in the following list):
GET /_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
red open datas 5 0 344999414 0 43.9gb 43.9gb
GET _cat/shards
datas 4 p STARTED 114991132 14.6gb 127.0.0.1 Eric the Red
datas 3 p STARTED 114995287 14.6gb 127.0.0.1 Eric the Red
datas 2 p STARTED 115012995 14.6gb 127.0.0.1 Eric the Red
datas 1 p UNASSIGNED
datas 0 p UNASSIGNED
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
14 65.9gb 710gb 202.8gb 912.8gb 77 127.0.0.1 127.0.0.1 Eric the Red
3 UNASSIGNED
Although deleting created shards doesn't seem to be supported, as mentioned on the comments above, reducing the number of replicas to zero for the indexes with UNASSIGNED shards might do the job, at least for single node clusters.
PUT /{my_index}/_settings
{
"index" : {
"number_of_replicas" : 0
}
}
reference
You can try deleting unassigned shard following way (Not sure though if it works for data index, works for marvel indices)
1) Install elasticsearch plugin - head. Refer Elastic Search Head Plugin Installation
2) Open your elasticsearch plugin - head URL in brwoser. From here you can easily check out which are unassigned shards and other related info. This will display infor regarding that shard.
{
"state": "UNASSIGNED",
"primary": true,
"node": null,
"relocating_node": null,
"shard": 0,
"index": ".marvel-es-2016.05.18",
"version": 0,
"unassigned_info": {
"reason": "DANGLING_INDEX_IMPORTED",
"at": "2016-05-25T05:59:50.678Z"
}
}
from here you can copy index name i.e. .marvel-es-2016.05.18.
3) Now you can run this query in sense
DELETE .marvel-es-2016.05.18
Hope this helps !

elasticsearch - remove a second elasticsearch node and add an other node, get unassigned shards

As a starter in Elasticsearch, I just use it for two weeks ago and I have just did a silly thing.
My Elasticsearch has one cluster with two nodes, one master-data node (version 1.4.2), one non-data node (version 1.1.1).There was conflict version when using, I decided to shutdown and delete the non-data node then, install another data node (version 1.4.2) See my image for easy imagine. node3 is named node2 then
Then, I check elastic status
{
"cluster_name":"elasticsearch",
"status":"yellow",
"timed_out":false,
"number_of_nodes":2,
"number_of_data_nodes":2,
"active_primary_shards":725,
"active_shards":1175,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":273
}
Check the cluster state
curl -XGET http://localhost:9200/_cat/shards
logstash-2015.03.25 2 p STARTED 3031 621.1kb 10.146.134.94 node1
logstash-2015.03.25 2 r UNASSIGNED
logstash-2015.03.25 0 p STARTED 3084 596.4kb 10.146.134.94 node1
logstash-2015.03.25 0 r UNASSIGNED
logstash-2015.03.25 3 p STARTED 3177 608.4kb 10.146.134.94 node1
logstash-2015.03.25 3 r UNASSIGNED
logstash-2015.03.25 1 p STARTED 3099 577.3kb 10.146.134.94 node1
logstash-2015.03.25 1 r UNASSIGNED
logstash-2014.12.30 4 r STARTED 10.146.134.94 node2
logstash-2014.12.30 4 p STARTED 94 114.3kb 10.146.134.94 node1
logstash-2014.12.30 0 r STARTED 111 195.8kb 10.146.134.94 node2
logstash-2014.12.30 0 p STARTED 111 195.8kb 10.146.134.94 node1
logstash-2014.12.30 3 r STARTED 110 144kb 10.146.134.94 node2
logstash-2014.12.30 3 p STARTED 110 144kb 10.146.134.94 node1
I have read related question and tried to follow it but no luck. I also comment in the answer the error i got.
ElasticSearch: Unassigned Shards, how to fix?
https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html
elasticsearch - what to do with unassigned shards
http://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html#cluster-reroute
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "logstash-2015.03.25",
"shard" : 4,
"node" : "node2",
"allow_primary" : true
}
}
]
}'
I get
"routing_nodes":{"unassigned":[{"state":"UNASSIGNED","primary":false,"node":null,
"relocating_node":null,"shard":0,"index":"logstash-2015.03.25"}
And I followed the answer in https://stackoverflow.com/a/23781013/1920536
curl -XPUT 'localhost:9200/_cluster/settings' -d '{
"transient" : {
"cluster.routing.allocation.enable" : "all"
}
}'
but no affection.
what should i do ?
Thank in advances.
Update: when I check pending task, it shows that:
{"tasks":[{"insert_order":88401,"priority":"HIGH","source":"shard-failed
([logstash-2015.01.19][3], node[PVkS47JyQQq6G-lstUW04w], [R], s[INITIALIZING]),
**reason [Failed to start shard, message** [RecoveryFailedException[[logstash-2015.01.19][3]: **Recovery failed from** [node1][_72bJJX0RuW7AyM86WUgtQ]
[localhost][inet[/localhost:9300]]{master=true} into [node2][PVkS47JyQQq6G-lstUW04w]
[localhost][inet[/localhost:9302]]{master=false}];
nested: RemoteTransportException[[node1][inet[/localhost:9300]]
[internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[[logstash-2015.01.19][3] Phase[2] Execution failed];
nested: RemoteTransportException[[node2][inet[/localhost:9302]][internal:index/shard/recovery/prepare_translog]];
nested: EngineCreationFailureException[[logstash-2015.01.19][3] **failed to create engine];
nested: FileSystemException**[data/elasticsearch/nodes/0/indices/logstash-2015.01.19/3/index/_0.si: **Too many open files**]; ]]","executing":true,"time_in_queue_millis":53,"time_in_queue":"53ms"}]}
If you have two nodes like
1) Node-1 - ES 1.4.2
2) Node-2 - Es 1.1.1
Now follow these steps to debug.
1) Stop all elasticsearch instance from Node-2.
2) Install elasticsearch 1.4.2 in new elasticsearch node.
Change elasticsearch.yml as master node configuration , specially these three config settings
cluster.name: <Same as master node>
node.name: < Node name for Node-2>
discovery.zen.ping.unicast.hosts: <Master Node IP>
3) Restart Node-2 Elasticsearch.
4) Verify Node-1 logs.

Get rid of unassigned shard

I've an ELK stack with two ElasticSearch nodes running and the cluster state turned red due to some unassigned shards which I can't get rid of. Looking up the unassigned shard, resp. the incomplete index with:
# curl -s elastic01.local:9200/_cat/shards | grep "logstash-2014.09.29"
Shows:
logstash-2014.09.29 4 p STARTED 745489 481.3mb 10.165.98.107 Crimson and the Raven
logstash-2014.09.29 4 r STARTED 745489 481.3mb 10.165.98.106 Glenn Talbot
logstash-2014.09.29 0 p STARTED 781110 502.3mb 10.165.98.107 Crimson and the Raven
logstash-2014.09.29 0 r STARTED 781110 502.3mb 10.165.98.106 Glenn Talbot
logstash-2014.09.29 3 p INITIALIZING 10.165.98.107 Crimson and the Raven
logstash-2014.09.29 3 r UNASSIGNED
logstash-2014.09.29 1 p STARTED 762991 490.1mb 10.165.98.107 Crimson and the Raven
logstash-2014.09.29 1 r STARTED 762991 490.1mb 10.165.98.106 Glenn Talbot
logstash-2014.09.29 2 p STARTED 761811 491.3mb 10.165.98.107 Crimson and the Raven
logstash-2014.09.29 2 r STARTED 761811 491.3mb 10.165.98.106 Glenn Talbot
My attempt to assign the shard to the other node fails:
curl XPOST -s 'http://elastic01.local:9200/_cluster/reroute?pretty=true' -d '{
"commands" : [ {
"allocate" : {
"index" : "logstash-2014.09.29",
"shard" : 3 ,
"node" : "Glenn Talbot",
"allow_primary" : 1
}
}
]
}'
With:
NO(primary shard is not yet active)]
I can't really seem to find an API to push the shard states any further. How could I proceed here?
Just for a complete picture, that what the system health looks like:
{
"cluster_name" : "logstash_es",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 2,
"active_primary_shards" : 114,
"active_shards" : 228,
"relocating_shards" : 0,
"initializing_shards" : 1,
"unassigned_shards" : 1
}
Thank you for your time and help
I actually ran into this situation with ElasticSearch 1.5 just the other day. After initially getting the same error, I simply repeated the /_cluster/reroute request the next day for lack of other ideas, and it worked, and it put the cluster back into a green state immediately.

Resources