Elasticsearch: evacuate all data before shutdown of a data node? - elasticsearch

Is there a way to tell a node to remove all of its data (spread it back out among the other nodes) so that I can shut it down and not deal with a rebalance/re-replicate once its down?
If I have 2 copies of each shard, and I drop one node, some of the shards now only have 1 live copy and it has to be re-replicated. I'd prefer to not drop down to 1 live copy for any period of time if I can.

After posting to the ES mailing list, I was informed the proper answer lies in the _cluster/settings api, specifically the cluster.routing.allocation.exclude._ip option.
From the docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "10.0.0.1"
}
}'
The IP address can be a comma separated list. To 'un-exclude', just remove the IP from the list (or set the list to "" to remove all excluded IPs).
Hopefully this helps others looking for the answer to this same question.

Related

How to get number of current open shards in elsticsearch cluster?

I can't find where to get the number of current open shards.
I want to make monitoring to avoid cases like this:
this cluster currently has [999]/[1000] maximum shards open
I can get maximum limit - max_shards_per_node
$ curl -X GET "${ELK_HOST}/_cluster/settings?include_defaults=true&flat_settings=true&pretty" 2>/dev/null | grep cluster.max_shards_per_node
"cluster.max_shards_per_node" : "1000",
$
But can't find out how to get number of the current open shards (999).
A very simple way to get this information is to call the _cat/shards API and count the number of lines using the wc shell command:
curl -s -XGET ${ELK_HOST}/_cat/shards | wc -l
That will yield a single number that represents the number of shards in your cluster.
Another option is to retrieve the cluster stats using JSON format, pipe the results into jq and then grab whatever you want, e.g. below I'm counting all STARTED shards:
curl -s -XGET ${ELK_HOST}/_cat/shards?format=json | jq ".[].state" | grep "STARTED" | wc -l
Yet another option is to query the _cluster/stats API:
curl -s -XGET ${ELK_HOST}/_cluster/stats?filter_path=indices.shards.total
That will return a JSON with the shard count
{
"indices" : {
"shards" : {
"total" : 302
}
}
}
To my knowledge there is no single number that ES spits out from any API with the single number. To be sure of that, let's look at the source code.
The error is thrown from IndicesService.java
To see how currentOpenShards is computed, we can then go to Metadata.java.
As you can see, the code is iterating over the index metadata that is retrieved from the cluster state, pretty much like running the following command and count the number of shards, but only for indices with "state" : "open"
GET _cluster/state?filter_path=metadata.indices.*.settings.index.number_of*,metadata.indices.*.state
From that evidence, we can pretty much be sure that the single number you're looking for is nowhere to be found, but needs to be computed by one of the methods I showed above. You're free to open a feature request if needed.
The problem: Seems that your elastic cluster number of shards per node are getting limited.
Solution:
Verify the number of shards per node in your configuration and increase it using elastic API.
For getting the number of shards - use _cluster/stats API:
curl -s -XGET 'localhost/_cluster/stats?filter_path=indices.shards.total'
From elastic docs:
The Cluster Stats API allows to retrieve statistics from a cluster
wide perspective. The API returns basic index metrics (shard numbers,
store size, memory usage) and information about the current nodes that
form the cluster (number, roles, os, jvm versions, memory usage, cpu
and installed plugins).
For updating number of shards (increasing/decreasing), use - _cluster/settings api:
For example:
curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/_cluster/settings' -d '{ "persistent" : {"cluster.max_shards_per_node" : 5000}}'
From elastic docs:
With specifications in the request body, this API call can update
cluster settings. Updates to settings can be persistent, meaning they
apply across restarts, or transient, where they don’t survive a full
cluster restart.
You can reset persistent or transient settings by assigning a null
value. If a transient setting is reset, the first one of these values
that is defined is applied:
the persistent setting the setting in the configuration file the
default value. The order of precedence for cluster settings is:
transient cluster settings persistent cluster settings settings in the
elasticsearch.yml configuration file. It’s best to set all
cluster-wide settings with the settings API and use the
elasticsearch.yml file only for local configurations. This way you can
be sure that the setting is the same on all nodes. If, on the other
hand, you define different settings on different nodes by accident
using the configuration file, it is very difficult to notice these
discrepancies.
curl -s '127.1:9200/_cat/indices' | awk '{ if ($2 == "open") C+=$5*$6} END {print C}'
This works:
GET /_stats?level=shards&filter_path=_shards.total
Reference:
https://stackoverflow.com/a/38108448/4271117

Elasticsearch - How to remove stuck persistent setting after upgrade

I have just upgraded my cluster from 5.6 to 6.1. I did a rolling upgrade as the documentation specified. It looks like a setting that I was using isn't available anymore in 6.1. That would've been fine, but now I can't even enable my shard allocation, so now my last node won't allocate its shards. Doing something as simple as this:
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
"persistent" : {
"cluster.routing.allocation.enable" : "all"
}
}
results in this:
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[inoreader-es4][92.247.179.253:9300][cluster:admin/settings/update]"
}
],
"type" : "illegal_argument_exception",
"reason" : "unknown setting [indices.store.throttle.max_bytes_per_sec] did you mean [indices.recovery.max_bytes_per_sec]?"
},
"status" : 400
}
No matter what setting I try to change I always get this error.
Yes, I did set indices.store.throttle.max_bytes_per_sec as persistent setting once in 5.x, and I'm OK with having to set it to a new name now, but how can I even remove it? It's not in elasticsearch.yml.
You'll need to unset that value. If you are still on the old version, you can use the following command (in 5.0 unsetting with null was added):
PUT _cluster/settings
{
"persistent": {
"indices.store.throttle.max_bytes_per_sec": null
}
}
This will however fail with a "persistent setting [indices.store.throttle.max_bytes_per_sec], not recognized" in your cluster if you have already upgraded.
At the moment (Elasticsearch version 6.1.1) the removed setting will be archived under archived.indices.store.throttle.max_bytes_per_sec. You can remove this and any other archived setting with:
PUT _cluster/settings
{
"persistent": {
"archived.*": null
}
}
However, there is a bug that only lets you unset archived settings before you change any other settings.
If you have already made other settings and are affected by this bug, the only solution is to downgrade to 5.6 again, unset the configuration (command at the top of this answer), and then do the upgrade again. It's probably enough to do this on one node (stop all others) as long as it's the master and all other nodes join that master and accept its corrected cluster state. Be sure to take a snapshot before in any case.
For future versions the archived.* behavior will probably change as stated in the ticket (though it's just in the planning phase right now):
[...] we should not archive unknown and broken cluster settings.
Instead, we should fail to recover the cluster state. The solution for
users in an upgrade case would be to rollback to the previous version,
address the settings that would be unknown or broken in the next major
version, and then proceed with the upgrade.
Manually editing or even deleting the cluster state on disk sounds very risky: The cluster state includes a lot of information (check for yourself with GET /_cluster/state) like templates, indices, routing table,... Even if you have the data of the data nodes, but you lost the cluster state, you wouldn't be able to access your data (the "map" how to form indices out of the shards is missing). If I remember correctly, in more recent ES versions the data nodes cache the cluster state and will try to restore from that, but that's a last resort and I wouldn't want to rely on it. Also I'm not sure if that might not also bring back your bad setting.
PS: I can highly recommend the free upgrade assistant going from 5.6 to 6.x.

Not able to disallocate shard from ES cluster

I have created a ES cluster with ES running on three different machine. In order to make them as cluster i have added the unicast config as below in all the 3 machine in elasticsearch.yml config file.
discovery.zen.ping.unicast.hosts:[IP1, IP2, IP3]
When i run
curl -XGET localhost:9200/_cluster/health?pretty
Am getting No_of_nodes as 3. Now i wanted to remove one node from the cluster
so without changing any config file i ran the below command
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "IP_adress_of_Node3"
}
}';
After this i ran the second command again to get the cluster details, expected output is NO_of_nodes should be 2 but in the result it is showing number of nodes=3 still even after excluding the node. It will of great help if someone can please tell me what is wrong in the steps followed for removing node.
Thanks
The command cluster.routing.allocation.exclude._ip that you sent to your cluster will not actually remove the node from your cluster, but rather prepare it for removal. What this does is, it instructs Elasticsearch to move all shards that are held on this node away from this node and store them on other nodes instead.
This allows you to then remove the node once it is empty, without causing under-replication of the shards stored on this node.
To actually remove the node from your cluster you would need to remove it from your list of unicast hosts. Of course you can also just shut it down and leave it in the list until you next need to restart your cluster anyway, as far as I am aware that won't hurt anything.

percolate returns empty matches under heavy load during elasticsearch cluster resizing

We have an elasticsearch cluster dynamically re-sizing in respect to percolate message count in a rabbitmq queue.
We have a single shard and ~18K query in our index, and we use auto_expand_replicas: "0-all" at index settings to copy single shard to all nodes when a node becomes available.
But during heavy load and cluster re-sizing, some requests produces unexpected empty matches.
We send ~1M percolate request daily and we were losing ~1K content. We added a cluster status control to our code, if cluster status is not green before and after percolate request we're waiting for green status and re-sending percolate request, we were able to reduce lost content count from 1K to ~100 in this way. We do not live this problem in a cluster with fixed node size.
Unfortunately any loss is not acceptable in our scenario, and we don't want to give up auto scaling, we need to find a workaround for this problem.
To repeat problem, you can use following bash script:
https://gist.github.com/ekesken/de41598a1e7e54c6f33c
This script will download and install elasticsearch 1.5.2 on your current directory, create a cluster with 10 nodes on your local and create index and percolation queries and will start testing.
Normally we expect following output for single percolate request:
curl -XGET 'localhost:9200/my-index/my-type/_percolate' -d '{
"doc" : {
"message" : "A new bonsai tree in the office"
}
}'
{"took":95,"_shards":{"total":1,"successful":1,"failed":0},"total":1,"matches":[{"_index":"my-index","_id":"tree"}]}
After running script, if you see all shards in all nodes are started at http://localhost:9200/_cat/shards response and test script is still running, that means you couldn't reproduce problem, try increasing node count which was 10 by default:
./repeat_percolation_loss.sh 15 test-only
When you reproduce problem, script will exit with following output:
{"took":209,"_shards":{"total":1,"successful":1,"failed":0},"total":0,"matches":[]}
Problem repeated! Congratulations.
You can shutdown all servers and clean all directory and files created via script with command:
./repeat_percolation_loss.sh 15 clean
Change node count above with latest node count you've tried.

High disk watermark exceeded even when there is not much data in my index

I'm using elasticsearch on my local machine. The data directory is only 37MB in size but when I check logs, I can see:
[2015-05-17 21:31:12,905][WARN ][cluster.routing.allocation.decider] [Chrome] high disk watermark [10%] exceeded on [h9P4UqnCR5SrXxwZKpQ2LQ][Chrome] free: 5.7gb[6.1%], shards will be relocated away from this node
Quite confused about what might be going wrong. Any help?
From Index Shard Allocation:
... watermark.high controls the high watermark. It defaults to 90%, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 90%.
The size of your actual index doesn't matter; it's the free space left on the device which matters.
If the defaults are not appropriate for you, you've to change them.
To resolve the issue in which, the log is recorded as:
high disk watermark [90%] exceeded on
[ytI5oTyYSsCVfrB6CWFL1g][ytI5oTy][/var/lib/elasticsearch/nodes/0]
free: 552.2mb[4.3%], shards will be relocated away from this node
You can update the threshold limit by executing following curl request:
curl -XPUT "http://localhost:9200/_cluster/settings" \
-H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster": {
"routing": {
"allocation.disk.threshold_enabled": false
}
}
}
}'
this slightly modified curl command from the Elasticsearch 6.4 docs worked for me:
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "2gb",
"cluster.routing.allocation.disk.watermark.high": "1gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "500mb",
"cluster.info.update.interval": "1m"
}
}
'
if the curl -XPUT command succeeds, you should see logs like this in the Elasticsearch terminal window:
[2018-08-24T07:16:05,584][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.routing.allocation.disk.watermark.low] from [85%] to [2gb]
[2018-08-24T07:16:05,585][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.routing.allocation.disk.watermark.high] from [90%] to [1gb]
[2018-08-24T07:16:05,585][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.routing.allocation.disk.watermark.flood_stage] from [95%] to [500mb]
[2018-08-24T07:16:05,585][INFO ][o.e.c.s.ClusterSettings ] [bhjM1bz] updating [cluster.info.update.interval] from [30s] to [1m]
https://www.elastic.co/guide/en/elasticsearch/reference/current/disk-allocator.html
Its a warning and won't affect anything. The storage processors (SPs) use high and low watermarks to determine when to flush their write caches.
The possible solution can be to free some memory
And the warning will disappear. Even with it showing, the replicas will not be assigned to the node which is okay. The elasticsearch will work fine.
Instead of percentage I use absolute values and rise values for better space use (in pre-prod):
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.threshold_enabled": true,
"cluster.routing.allocation.disk.watermark.low": "1g",
"cluster.routing.allocation.disk.watermark.high": "500m",
"cluster.info.update.interval": "5m"
}
}
Also I reduce pooling interval to make ES logs shorter ))
Clear up some space on your hard drive, that should fix the issue. This shall also change the health of your ES clusters from Yellow to green (if you got the above issue, you are most likely to face the yellow cluster health issue as well).

Resources