Snapshot exists but still gives snapshot restore exception elastic search

Snapshot exists but still gives snapshot restore exception elastic search - elasticsearch

I took an elastic search snapshot (snapshot_1) of 4 indices on an EC2 server "A" and copied the data to another EC2 server "B". I updated the path in elasticsearch.yml and restarted the ES on server B. (I updated the path and restarted ES before putting the data on that path but the path existed and had the required access).
Upon querying the index file in the snapshot directory I do see that snapshot_1 exists.
[elasticsearch#2ed2c2eaa5be uat_dump]$ cat index-0
{"snapshots":[{"name":"snapshot_1","uuid":"bBc6chD0TCKiQvuqn8gsow","state":1}],"indices":{"my_index_1":{"id":"J-c4ZvN0T02HeyQR8ueyZw","snapshots":["bBc6chD0TCKiQvuqn8gsow"]},"my_index_2":{"id":"ifn1Geq2RHe6wAMuGxpAMw","snapshots":["bBc6chD0TCKiQvuqn8gsow"]},"my_index_3":{"id":"X9dPrB3fRd-WrfNnZN69mQ","snapshots":["bBc6chD0TCKiQvuqn8gsow"]},"my_index_4":{"id":"9OjzD37WRROJFkfu-N7LNg","snapshots":["bBc6chD0TCKiQvuqn8gsow"]}}}
But when I am trying to restore the snapshot, I receive the error
{
"error": {
"root_cause": [
{
"type": "snapshot_restore_exception",
"reason": "[my_backup:snapshot_1] snapshot does not exist"
}
],
"type": "snapshot_restore_exception",
"reason": "[my_backup:snapshot_1] snapshot does not exist"
},
"status": 500
}
I do a file incompatible-snapshots and its contents are
{"incompatible-snapshots":[]}
Any pointers what I could be doing wrong?

Related

How to create GCP ElasticSearch Service automated snapshots/backups?

I was under the impression GCP ElasticSearch service comes with automated snapshots/backups. That's what I find in the documentation. It suggests they happen once a day and are stored on storage but I do not see any backups in any of my GCP storage. How do you get access to the automated snapshots?
Try the below command on dev tools
GET _cat/snapshots/cs-automated?v
Output error message:
"type" : "repository_missing_exception",
"reason" : "[cs-automated] missing"
"type" : "repository_missing_exception",
"reason" : "[cs-automated] missing"
"status" : 404

Try:
GET /_snapshot/found-snapshots
Or
GET _cat/snapshots/found-snapshots?v

Google Drive API - insufficientFilePermissions error when downloading a file

I am working on a project which downloads files using the google drive api. I am using a service account that has all the drive permissions (https://www.googleapis.com/auth/drive).
I am able to download some files without any problems, but sometimes I get the following error:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "insufficientFilePermissions",
"message": "The user does not have sufficient permissions for this file."
}
],
"code": 403,
"message": "The user does not have sufficient permissions for this file."
}
}
When I attempt to download a file I impersonate the owner of the file. The owner of the file will definitely have access to the file so I am not sure why I am getting this error.
Is anyone able to explain how I could possibly be getting this error?

The problem was that the user I was trying to impersonate was suspended.

Google Cloud Storage Repository Plugin

I have a K8 cluster on GCP running elasticsearch. Now I need to create a backup.
I've installed the GCS-plugin on my pods in stateful-set and tried setting it up with the following documentation:
https://github.com/elastic/elasticsearch/blob/master/docs/plugins/repository-gcs.asciidoc
When I try to configure a repository to use credentials stored in keystore I get the following response back:
{
"error": {
"root_cause": [
{
"type": "repository_exception",
"reason": "[my_backup] repository type [gcs] does not exist"
}
],
"type": "repository_exception",
"reason": "[my_backup] repository type [gcs] does not exist"
},
"status": 500
}
Any lead would be helpful, thanks!

I think the problem is that I can't install the plugin on the nodes, so I’ve installed it on the pods instead. And that the installation is not persistent after I restart the pods. So to make the installation persist on K8 I needed to build a custom image that installs the plugin. A bit tricky, but the plugin seems to be intended for GCE. So I decided to move from K8 to a managed instance group on GCE instead.

How to know the curator version?

I am using Elasticsearch version 2.1.0. How can I know the version of Curator being used ?
While changing the settings (number of replicas) I am getting an exception as:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason":Can't update [index.number_of_replicas] on closed indices [[.marvel-es-2016.12.12] - can leave index in an unopenable state"
"status": 400
}
Any clues ?

You can get curator version by using the following command
$ curator --version
I think you are trying to set the replica to a indices which is in closed state.
Try setting replicas after opening the indices.
Related information can be found here

Recovering from Consul "No Cluster leader" state

I have:
one mesos-master in which I configured a consul server;
one mesos-slave in which I configure consul client, and;
one bootstrap server for consul.
When I hit start I am seeing the following error:
2016/04/21 19:31:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
2016/04/21 19:31:44 [ERR] agent: coordinate update error: rpc error: No cluster leader
How do I recover from this state?

Did you look at the Consul docs ?
It looks like you have performed a ungraceful stop and now need to clean your raft/peers.json file by removing all entries there to perform an outage recovery. See the above link for more details.

As of Consul 0.7 things work differently from Keyan P's answer. raft/peers.json (in the Consul data dir) has become a manual recovery mechanism. It doesn't exist unless you create it, and then when Consul starts it loads the file and deletes it from the filesystem so it won't be read on future starts. There are instructions in raft/peers.info. Note that if you delete raft/peers.info it won't read raft/peers.json but it will delete it anyway, and it will recreate raft/peers.info. The log will indicate when it's reading and deleting the file separately.
Assuming you've already tried the bootstrap or bootstrap_expect settings, that file might help. The Outage Recovery guide in Keyan P's answer is a helpful link. You create raft/peers.json in the data dir and start Consul, and the log should indicate that it's reading/deleting the file and then it should say something like "cluster leadership acquired". The file contents are:
[ { "id": "<node-id>", "address": "<node-ip>:8300", "non_voter": false } ]
where <node-id> can be found in the node-id file in the data dir.

If u got raft version more than 2:
[
{
"id": "e3a30829-9849-bad7-32bc-11be85a49200",
"address": "10.88.0.59:8300",
"non_voter": false
},
{
"id": "326d7d5c-1c78-7d38-a306-e65988d5e9a3",
"address": "10.88.0.45:8300",
"non_voter": false
},
{
"id": "a8d60750-4b33-99d7-1185-b3c6d7458d4f",
"address": "10.233.103.119",
"non_voter": false
}
]

In my case I had 2 worker nodes in the k8s cluster, after adding another node the consul servers could elect a master and everything is up and running.

I will update what I did:
Little Background: We scaled down the AWS Autoscaling so lost the leader. But we had one server still running but without any leader.
What I did was:
I scaled up to 3 servers(don't make 2-4)
stopped consul in all 3 servers.sudo service consul stop(you can do status/stop/start)
created peers.json file and put it in old server(/opt/consul/data/raft)
start the 3 servers (peers.json should be placed on 1 server only)
For other 2 servers join it to leader using consul join 10.201.8.XXX
check peers are connected to leader using consul operator raft list-peers
Sample peers.json file
[
{
"id": "306efa34-1c9c-acff-1226-538vvvvvv",
"address": "10.201.n.vvv:8300",
"non_voter": false
},
{
"id": "dbeeffce-c93e-8678-de97-b7",
"address": "10.201.X.XXX:8300",
"non_voter": false
},
{
"id": "62d77513-e016-946b-e9bf-0149",
"address": "10.201.X.XXX:8300",
"non_voter": false
}
]
These id you can get from each server in /opt/consul/data/
[root#ip-10-20 data]# ls
checkpoint-signature node-id raft serf
[root#ip-10-1 data]# cat node-id
Some useful commands:
consul members
curl http://ip:8500/v1/status/peers
curl http://ip:8500/v1/status/leader
consul operator raft list-peers
cd opt/consul/data/raft/
consul info
sudo service consul status
consul catalog services

You may also ensure that bootstrap parameter is set in your Consul configuration file config.json on the first node:
# /etc/consul/config.json
{
"bootstrap": true,
...
}
or start the consul agent with the -bootstrap=1 option as described in the official Failure of a single server cluster Consul documentation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Snapshot exists but still gives snapshot restore exception elastic search - elasticsearch

Related

How to create GCP ElasticSearch Service automated snapshots/backups?

Google Drive API - insufficientFilePermissions error when downloading a file

Google Cloud Storage Repository Plugin

How to know the curator version?

Recovering from Consul "No Cluster leader" state

Categories

Resources