Elasticsearch delete multiple snapshots - elasticsearch

I am trying to delete multiple snapshots, tried to separate them with a comma, but it didn't work. The documentation doesn't seem to talk about this. I also can't seem to be able to see the deletion in progress if I query the status api, is there a way to see the status of a snapshot deletion?
Edit: Using ES 1.7

As of ES 6.1.0, you can't check the deletion progress of a snapshot.
However, you can use the /_cat/tasks and look for tasks with snapshot/delete to check if the snapshot is being deleted.

#tudalex,
AS of ES 6.3 you cannot delete multiple snapshots at a time. If you try to delete another snapshot while you previous snapshot is getting deleted, you will get the following error:
{
"error" : {
"root_cause" : [
{
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[snapshotXX] another snapshot is currently running cannot delete"
}
],
"type" : "concurrent_snapshot_execution_exception",
"reason" : "[snapshotXX] another snapshot is currently running cannot delete"
},
"Status. 503.
}
You can keep a retry mechanism which can poll ES for any snapshot deletion task, if no snapshot deletion task is running, you can initiate deletion of the next snapshot. You can do something like below:
curator.DeleteSnapshots(snapshot_list, retry_interval=30, retry_count=5).do_action() except (curator.exceptions.SnapshotInProgress, curator.exceptions.NoSnapshots, curator.exceptions.FailedExecution) as e: print(e)
The above code [3] retries deleting the snapshot every 30 seconds and will retry upto 5 times incase it is unable to delete snapshot in the previous attempt.

Related

Enrich policy - Could not obtain lock because policy execution is already in progress

I accidentally created enrich policy with typo
PUT /_enrich/policy/grooup-info
and then i ran execute on enrich
PUT /_enrich/policy/group-info/_execute
when i saw typo policy creation, i removed it a created enrich policy without typo and tried to run execute again, but now i am getting this error:
{
"error" : {
"root_cause" : [
{
"type" : "es_rejected_execution_exception",
"reason" : "Could not obtain lock because policy execution for [group-info] is already in progress."
}
],
"type" : "es_rejected_execution_exception",
"reason" : "Could not obtain lock because policy execution for [group-info] is already in progress."
},
"status" : 429
}
Is there any way how to fix this error?
Please let me know. Thanks for your time.
To my knowledge, it's not possible to execute an enrich policy that does not exist, you would get an error like Could not locate policy with id [group-info] when running your second command the first time.
So it's only possible because the group-info policy already existed when you executed it the first time. And if that's the case, those policy executions are not cancellable, so you'll need to wait until it's finished, delete it with DELETE /_enrich/policy/group-info, recreate it and re-execute it properly.
You can monitor the policy execution with
GET _tasks?actions=policy*
or
GET _enrich/_stats

Scroll contexts are left open and they never get deleted or expired in Elasticsearch v7.3?

I am using ES v7.3 and using slicing to stream data from ES but what i observe is that after we stream the data a few times some scroll contexts are left open and they remain open for days and does not get expired or killed and hence the search keeps on going and high cpu spikes are observed. Also in the logs we get the following message
[2020-02-07T06:49:33,559][DEBUG][o.e.a.s.TransportSearchScrollAction] [ip-1-0-104-220] [1234717] Failed to execute query phase
org.elasticsearch.transport.RemoteTransportException: [ip-1-0-104-220][1.0.104.220:9300][indices:data/read/search[phase/query/scroll]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [1234717]
at org.elasticsearch.search.SearchService.getExecutor(SearchService.java:462) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.search.SearchService.runAsync(SearchService.java:344) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:401) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.action.search.SearchTransportService.lambda$registerRequestHandler$10(SearchTransportService.java:367) ~[elasticsearch-7.3.1.jar:7.3.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) [x-pack-security-7.3.1.jar:7.3.1]
Would request if anyone can guide us if we are missing any setting on index level that we should enforce so that these open contexts get killed or get expired after a timeout is reached.
To check and delete the open contexts i am using the following command respectively,
GET _nodes/stats/indices?filter_path=**.open_contexts
DELETE /_search/scroll/_all
Moreover my timeouts are,
exports.ELASTICSEARCH = {
PARALLEL_SLICES : 2,
SCROLL_ALIVE_TIME : '5m',
SLICE_ALIVE_TIME : '1m',
SCROLL_SIZE : 10000,
REQUEST_RETRY_COUNT : 5,
REQUEST_TIMEOUT : 120000, // in milliSecond
ERROR_RETRY_COUNT : 3
};

Elasticsearch - How to remove stuck persistent setting after upgrade

I have just upgraded my cluster from 5.6 to 6.1. I did a rolling upgrade as the documentation specified. It looks like a setting that I was using isn't available anymore in 6.1. That would've been fine, but now I can't even enable my shard allocation, so now my last node won't allocate its shards. Doing something as simple as this:
curl -XPUT 'localhost:9200/_cluster/settings?pretty' -H 'Content-Type: application/json' -d'
{
"persistent" : {
"cluster.routing.allocation.enable" : "all"
}
}
results in this:
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[inoreader-es4][92.247.179.253:9300][cluster:admin/settings/update]"
}
],
"type" : "illegal_argument_exception",
"reason" : "unknown setting [indices.store.throttle.max_bytes_per_sec] did you mean [indices.recovery.max_bytes_per_sec]?"
},
"status" : 400
}
No matter what setting I try to change I always get this error.
Yes, I did set indices.store.throttle.max_bytes_per_sec as persistent setting once in 5.x, and I'm OK with having to set it to a new name now, but how can I even remove it? It's not in elasticsearch.yml.
You'll need to unset that value. If you are still on the old version, you can use the following command (in 5.0 unsetting with null was added):
PUT _cluster/settings
{
"persistent": {
"indices.store.throttle.max_bytes_per_sec": null
}
}
This will however fail with a "persistent setting [indices.store.throttle.max_bytes_per_sec], not recognized" in your cluster if you have already upgraded.
At the moment (Elasticsearch version 6.1.1) the removed setting will be archived under archived.indices.store.throttle.max_bytes_per_sec. You can remove this and any other archived setting with:
PUT _cluster/settings
{
"persistent": {
"archived.*": null
}
}
However, there is a bug that only lets you unset archived settings before you change any other settings.
If you have already made other settings and are affected by this bug, the only solution is to downgrade to 5.6 again, unset the configuration (command at the top of this answer), and then do the upgrade again. It's probably enough to do this on one node (stop all others) as long as it's the master and all other nodes join that master and accept its corrected cluster state. Be sure to take a snapshot before in any case.
For future versions the archived.* behavior will probably change as stated in the ticket (though it's just in the planning phase right now):
[...] we should not archive unknown and broken cluster settings.
Instead, we should fail to recover the cluster state. The solution for
users in an upgrade case would be to rollback to the previous version,
address the settings that would be unknown or broken in the next major
version, and then proceed with the upgrade.
Manually editing or even deleting the cluster state on disk sounds very risky: The cluster state includes a lot of information (check for yourself with GET /_cluster/state) like templates, indices, routing table,... Even if you have the data of the data nodes, but you lost the cluster state, you wouldn't be able to access your data (the "map" how to form indices out of the shards is missing). If I remember correctly, in more recent ES versions the data nodes cache the cluster state and will try to restore from that, but that's a last resort and I wouldn't want to rely on it. Also I'm not sure if that might not also bring back your bad setting.
PS: I can highly recommend the free upgrade assistant going from 5.6 to 6.x.

Are ElasticSearch scripts safe for concurrency issues?

I'm running a process which updates user documents on ElasticSearch. This process can run on multiple instances on different machines. In case 2 instances will try to run a script to update the same document in the same time, can there be a case that some of the data will be lost because of a race-condition? or that the internal script mechanism is safe (using the version property for optimistic locking or any other way)?
The official ES scripts documentation
Using the version attribute is safe for that kind of jobs.
Do the search with version: true
GET /index/type/_search
{
"version": true
your_query...
}
Then for the update, add a version attribute corresponding to the number returned during the search.
POST /index/type/the_id_to_update/_update?version=3 // <- returned by the search
{
"doc":{
"ok": "name"
}
}
https://www.elastic.co/guide/en/elasticsearch/guide/current/version-control.html

percolate returns empty matches under heavy load during elasticsearch cluster resizing

We have an elasticsearch cluster dynamically re-sizing in respect to percolate message count in a rabbitmq queue.
We have a single shard and ~18K query in our index, and we use auto_expand_replicas: "0-all" at index settings to copy single shard to all nodes when a node becomes available.
But during heavy load and cluster re-sizing, some requests produces unexpected empty matches.
We send ~1M percolate request daily and we were losing ~1K content. We added a cluster status control to our code, if cluster status is not green before and after percolate request we're waiting for green status and re-sending percolate request, we were able to reduce lost content count from 1K to ~100 in this way. We do not live this problem in a cluster with fixed node size.
Unfortunately any loss is not acceptable in our scenario, and we don't want to give up auto scaling, we need to find a workaround for this problem.
To repeat problem, you can use following bash script:
https://gist.github.com/ekesken/de41598a1e7e54c6f33c
This script will download and install elasticsearch 1.5.2 on your current directory, create a cluster with 10 nodes on your local and create index and percolation queries and will start testing.
Normally we expect following output for single percolate request:
curl -XGET 'localhost:9200/my-index/my-type/_percolate' -d '{
"doc" : {
"message" : "A new bonsai tree in the office"
}
}'
{"took":95,"_shards":{"total":1,"successful":1,"failed":0},"total":1,"matches":[{"_index":"my-index","_id":"tree"}]}
After running script, if you see all shards in all nodes are started at http://localhost:9200/_cat/shards response and test script is still running, that means you couldn't reproduce problem, try increasing node count which was 10 by default:
./repeat_percolation_loss.sh 15 test-only
When you reproduce problem, script will exit with following output:
{"took":209,"_shards":{"total":1,"successful":1,"failed":0},"total":0,"matches":[]}
Problem repeated! Congratulations.
You can shutdown all servers and clean all directory and files created via script with command:
./repeat_percolation_loss.sh 15 clean
Change node count above with latest node count you've tried.

Resources