Elasticsearch - How to remove stuck persistent setting after upgrade

Elasticsearch - How to remove stuck persistent setting after upgrade - elasticsearch

I have just upgraded my cluster from 5.6 to 6.1. I did a rolling upgrade as the documentation specified. It looks like a setting that I was using isn't available anymore in 6.1. That would've been fine, but now I can't even enable my shard allocation, so now my last node won't allocate its shards. Doing something as simple as this:
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
"persistent" : {
"cluster.routing.allocation.enable" : "all"
}
}
results in this:
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[inoreader-es4][92.247.179.253:9300][cluster:admin/settings/update]"
}
],
"type" : "illegal_argument_exception",
"reason" : "unknown setting [indices.store.throttle.max_bytes_per_sec] did you mean [indices.recovery.max_bytes_per_sec]?"
},
"status" : 400
}
No matter what setting I try to change I always get this error.
Yes, I did set indices.store.throttle.max_bytes_per_sec as persistent setting once in 5.x, and I'm OK with having to set it to a new name now, but how can I even remove it? It's not in elasticsearch.yml.

You'll need to unset that value. If you are still on the old version, you can use the following command (in 5.0 unsetting with null was added):
PUT _cluster/settings
{
"persistent": {
"indices.store.throttle.max_bytes_per_sec": null
}
}
This will however fail with a "persistent setting [indices.store.throttle.max_bytes_per_sec], not recognized" in your cluster if you have already upgraded.
At the moment (Elasticsearch version 6.1.1) the removed setting will be archived under archived.indices.store.throttle.max_bytes_per_sec. You can remove this and any other archived setting with:
PUT _cluster/settings
{
"persistent": {
"archived.*": null
}
}
However, there is a bug that only lets you unset archived settings before you change any other settings.
If you have already made other settings and are affected by this bug, the only solution is to downgrade to 5.6 again, unset the configuration (command at the top of this answer), and then do the upgrade again. It's probably enough to do this on one node (stop all others) as long as it's the master and all other nodes join that master and accept its corrected cluster state. Be sure to take a snapshot before in any case.
For future versions the archived.* behavior will probably change as stated in the ticket (though it's just in the planning phase right now):
[...] we should not archive unknown and broken cluster settings.
Instead, we should fail to recover the cluster state. The solution for
users in an upgrade case would be to rollback to the previous version,
address the settings that would be unknown or broken in the next major
version, and then proceed with the upgrade.
Manually editing or even deleting the cluster state on disk sounds very risky: The cluster state includes a lot of information (check for yourself with GET /_cluster/state) like templates, indices, routing table,... Even if you have the data of the data nodes, but you lost the cluster state, you wouldn't be able to access your data (the "map" how to form indices out of the shards is missing). If I remember correctly, in more recent ES versions the data nodes cache the cluster state and will try to restore from that, but that's a last resort and I wouldn't want to rely on it. Also I'm not sure if that might not also bring back your bad setting.
PS: I can highly recommend the free upgrade assistant going from 5.6 to 6.x.

Related

Elasticsearch posting documents : FORBIDDEN/12/index read-only / allow delete (api)]

Running Elasticsearch version 7.3.0, I posted 50 million documents in my index. when trying to post more documents to Elasticsearch i keep getting this message :
response code: 403 cluster_block_exception [FORBIDDEN/12/index read-only / allow delete (api)];
Disk watermark exceeded
I have 40 GB of free data and extended disk space but still keep getting this error
Any thoughts on what can be causing this ?

You must have hit the floodstage watermark at 95%. Where to go from here:
Free disk space (you seem to have done that already).
Optionally change the default setting. 5% of 400GB might be a bit too aggressive for blocking write operations. You can either use percent or absolute values for this — this is just a sample and you might want to pick different values:
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "5gb",
"cluster.routing.allocation.disk.watermark.high": "2gb",
"cluster.routing.allocation.disk.watermark.flood_stage": "1gb",
"cluster.info.update.interval": "1m"
}
}
You must reset the index blocking (either per affected index or on a global level with all):
PUT /_all/_settings
{
"index.blocks.read_only_allow_delete": null
}
BTW in 7.4 this will change: Once you go below the high watermark, the index will unlock automatically.

Best practices for ElasticSearch initial bulk import

I'm running a docker setup with ElasticSearch, Logstash, Filebeat and Kibana inspired by the Elastic Docker Compose. I need to initial load 15 GB og logfiles into the system (Filebeat->Logstash->ElasticSearch) but I'm having some issues with performance.
It seems that Filebeat/Logstash is outputting too much work for ElasticSearch. After some time I begin to see a bunch of errors in ElasticSearch like this:
[INFO ][o.e.i.IndexingMemoryController] [f8kc50d] now throttling indexing for shard [log-2017.06.30]: segment writing can't keep up
I've found this old documentation article on how to disable merge throttling: https://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html#segments-and-merging.
PUT /_cluster/settings
{
"transient" : {
"indices.store.throttle.type" : "none"
}
}
But in current version (ElasticSearch 6) it gives me this error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "transient setting [indices.store.throttle.type], not dynamically updateable"
}
],
"type": "illegal_argument_exception",
"reason": "transient setting [indices.store.throttle.type], not dynamically updateable"
},
"status": 400
}
How can I solve the above issue?
The VM has 4 CPU cores (Intel Xeon E5-2650) and ElasticSearch is assigned 4GB of RAM, Logstash and Kibana 1GB each. Swapping is disabled using "swapoff -a". X-pack and monitoring is enabled. I only have one ES node for this log server. Do I need to have multiple node for this initial bulk import?
EDIT1:
Changing the number_of_replicas and refresh_interval seems to make it perform better. Still testing.
PUT /log-*/_settings
{
"index.number_of_replicas" : "0",
"index.refresh_interval" : "-1"
}

Most likely the bottleneck is IO (you can confirm this running iostat, also it would be useful if you post ES monitoring screenshots), so you need to reduce pressure on it.
Default ES configuration causes generation of many index segments during a bulk load. To fix this, for the bulk load, increase index.refresh_interval (or set it to -1) - see doc. The default value is 1 sec, which causes new segment to be created every 1 second, also try to increase batch size and see if it helps.
Also if you use spinning disks,set index.merge.scheduler.max_thread_count to 1. This will allow only one thread to perform segments merging and will reduce contention for IO between segments merging and indexing.

Not able to disallocate shard from ES cluster

I have created a ES cluster with ES running on three different machine. In order to make them as cluster i have added the unicast config as below in all the 3 machine in elasticsearch.yml config file.
discovery.zen.ping.unicast.hosts:[IP1, IP2, IP3]
When i run
curl -XGET localhost:9200/_cluster/health?pretty
Am getting No_of_nodes as 3. Now i wanted to remove one node from the cluster
so without changing any config file i ran the below command
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "IP_adress_of_Node3"
}
}';
After this i ran the second command again to get the cluster details, expected output is NO_of_nodes should be 2 but in the result it is showing number of nodes=3 still even after excluding the node. It will of great help if someone can please tell me what is wrong in the steps followed for removing node.
Thanks

The command cluster.routing.allocation.exclude._ip that you sent to your cluster will not actually remove the node from your cluster, but rather prepare it for removal. What this does is, it instructs Elasticsearch to move all shards that are held on this node away from this node and store them on other nodes instead.
This allows you to then remove the node once it is empty, without causing under-replication of the shards stored on this node.
To actually remove the node from your cluster you would need to remove it from your list of unicast hosts. Of course you can also just shut it down and leave it in the list until you next need to restart your cluster anyway, as far as I am aware that won't hurt anything.

Elasticsearch delete multiple snapshots

I am trying to delete multiple snapshots, tried to separate them with a comma, but it didn't work. The documentation doesn't seem to talk about this. I also can't seem to be able to see the deletion in progress if I query the status api, is there a way to see the status of a snapshot deletion?
Edit: Using ES 1.7

As of ES 6.1.0, you can't check the deletion progress of a snapshot.
However, you can use the /_cat/tasks and look for tasks with snapshot/delete to check if the snapshot is being deleted.

#tudalex,
AS of ES 6.3 you cannot delete multiple snapshots at a time. If you try to delete another snapshot while you previous snapshot is getting deleted, you will get the following error:
{
"error" : {
"root_cause" : [
{
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[snapshotXX] another snapshot is currently running cannot delete"
}
],
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[snapshotXX] another snapshot is currently running cannot delete"
},
"Status. 503.
}
You can keep a retry mechanism which can poll ES for any snapshot deletion task, if no snapshot deletion task is running, you can initiate deletion of the next snapshot. You can do something like below:
curator.DeleteSnapshots(snapshot_list, retry_interval=30, retry_count=5).do_action() except (curator.exceptions.SnapshotInProgress, curator.exceptions.NoSnapshots, curator.exceptions.FailedExecution) as e: print(e)
The above code [3] retries deleting the snapshot every 30 seconds and will retry upto 5 times incase it is unable to delete snapshot in the previous attempt.

Are ElasticSearch scripts safe for concurrency issues?

I'm running a process which updates user documents on ElasticSearch. This process can run on multiple instances on different machines. In case 2 instances will try to run a script to update the same document in the same time, can there be a case that some of the data will be lost because of a race-condition? or that the internal script mechanism is safe (using the version property for optimistic locking or any other way)?
The official ES scripts documentation

Using the version attribute is safe for that kind of jobs.
Do the search with version: true
GET /index/type/_search
{
"version": true
your_query...
}
Then for the update, add a version attribute corresponding to the number returned during the search.
POST /index/type/the_id_to_update/_update?version=3 // <- returned by the search
{
"doc":{
"ok": "name"
}
}
https://www.elastic.co/guide/en/elasticsearch/guide/current/version-control.html

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio