Elasticsearch snapshot restore from S3 failed with RepositoryMissingException - elasticsearch

I was able to create the repository successfully, and list the snapshots. That showed the repository couldn't be missing.
Yet the restore request failed with RepositoryMissingException, with the following details.
shard has failed to be restored from the snapshot [lb-es-snapshots:snapshot-
2022-01-09/gnA_ObsiRmOA-ydXfZWfbA] because of [failed shard on node [RsQQ6-
L6R_6qTIJigizMXQ]: failed recovery, failure RecoveryFailedException[[api][0]: Recovery
failed on {my-release-elasticsearch-data-0}{RsQQ6-L6R_6qTIJigizMXQ}
{w3E20XKZTHyAvpI7XEogjQ}{my-release-elasticsearch-data-0.my-release-elasticsearch-data-
hl.default.svc.cluster.local}{10.244.1.24:9300}{d}{xpack.installed=true,
transform.node=false}]; nested: RepositoryMissingException[[lb-es-snapshots] missing]; ]
- manually close or delete the index [api] in order to retry to restore the snapshot
again or use the reroute API to force the allocation of an empty primary shard
Is there a way to make sense of the error? the logs on the nodes show the same exception.

answering my own question
before registering the repository, you need to manually add the S3 access and secret key to the keystore on BOTH (this is not mentioned in ES documentation) the first master node and the first data node, and reload the settings.

Related

How to recover from "Proposed Flow does not contain a Connection with ID xxx but this instance has data queued in that connection"?

One of my nifi nodes/instances is refusing to reconnect to the cluster
Proposed flow is not inheritable by the flow controller and cannot completely replace the current flow due to: Proposed Flow does not contain a Connection with ID 4d2c4e9d-0176-1000-0000-0000310c611f but this instance has data queued in that connection, updateId=307]
Without entering in why this happened, how can I recover from this error? Even if I overwrite the flow.xml.gz file it will refuse to accept it because it knows that there is data queued for that connection.
Can I flush / delete that data somehow?
I had tried to delete/move
flow.xml.gz
flowfile_repository
content_repository
database_repository
But I get the same error on startup, where does Nifi track that connection 4d2c4e9d-0176-1000-0000-0000310c611f had data in this nifi node?
Deleting (back it up first) the flow.xml.gz file should fix it.
Make sure that you are actually moving/deleting the right flox.xml.gz file since it may not be in the default location.
So check the actual location of the flow file at $NIFI_HOME/conf/nifi.properties , look for nifi.flow.configuration.file. Then delete that one (backup first) and the node should be able to reconnect.

Elasticsearch not opening in cloud URL

I am getting below error while opening elastic cloud URL
Error
at Fetch._callee3$ (https://id.eastus2.azure.elastic-cloud.com:9243/bundles/commons.bundle.js:3:232)
at l (https://id.eastus2.azure.elastic-cloud.com:9243/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:970406)
at Generator._invoke (https://id.eastus2.azure.elastic-cloud.com:9243/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:232)
at Generator.forEach.e.<computed> [as next] (https://id.eastus2.azure.elastic-cloud.com:9243/bundles/kbn-ui-shared-deps/kbn-ui-shared-deps.js:288:970763)
at asyncGeneratorStep (https://id.eastus2.azure.elastic-cloud.com:9243/bundles/commons.bundle.js:3:3991504)
at _next (https://id.eastus2.azure.elastic-cloud.com:9243/bundles/commons.bundle.js:3:3991815)
Also after this I am reloading the elastic cloud, getting below error :
{"statusCode":503,"error":"Service Unavailable","message":"No shard available
for [get [.kibana][_doc][space:default]: routing [null]]:
[no_shard_available_action_exception] No shard available for [get [.kibana][_doc][space:default]: routing [null]]"}
Can aanyone please help
The first error message doesn't sound very helpful, but in the second one it's clear that the .kibana index is missing. Do you know why that might have happened there? I would generally look at the following options:
If you were using Kibana already and need to restore some visualizations or dashboards, do a partial restore from a snapshot.
If you are ok to start over, restart Kibana (potentially remove it from your deployment and then add it again). That should generally recreate the .kibana index.
If none of that works, contact support.

[Elasticsearch]: Unable to Recover Primary Shard

I'm using Elasticsearch 2.3.5 version. I have to recover the complete data from the backup disks. Everything got recovered except 2 shards. While checking logs, I found the following error.
ERROR:
Caused by: java.nio.file.NoSuchFileException: /data/<cluster_name>/nodes/0/indices/index_name/shard_no/index/_c4_49.liv
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:186)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
at org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:83)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:73)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:99)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:435)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:100)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:283)
... 12 more
Can anyone suggest why is this happening or anyhow I can skip this particular file?
Thanks in Advance
Unfortunately restoring Elasticsearch from a filesystem backup is not a reliable way to recover your data, and is expected to fail like this sometimes. You should always use snapshot and restore instead. Your version is rather old, but more recent versions include this warning in the docs (which also applies to your version):
WARNING: You cannot back up an Elasticsearch cluster by simply copying the data directories of all of its nodes. Elasticsearch may be making changes to the contents of its data directories while it is running; copying its data directories cannot be expected to capture a consistent picture of their contents. If you try to restore a cluster from such a backup, it may fail and report corruption and/or missing files. Alternatively, it may appear to have succeeded though it silently lost some of its data. The only reliable way to back up a cluster is by using the snapshot and restore functionality.
It is possible that the restore has silently lost data in other shards too, there's no way to tell. Assuming you don't also have a snapshot of the data held in the lost shards, the only way to recover it is to reindex it from its source.

kubernetes rolling update for elasticsearch

I am performing a simple rolling update for elasticsearch image. The command I use is
kubectl set image deployment master-deployment elasticsearch={private registry}/elasticsearch:{tag}
However, the elasticsearch always gets IOException after the rolling update.
Caused by: java.io.IOException: failed to read [id:60, legacy:false, file:/var/lib/elasticsearch/nodes/0/_state/global-60.st]
I have checked the directory /var/lib/elasticsearch/nodes/0/_state/. It has global-10.st file present but not global-60.st.
How should I make sure the image itself synchronizes well with the files present?
I think you should go with statefulSet and external storage (I.e pvc - don’t store the data inside the pod. )

Unable to perform operation "make branch" in replica <replica name> of vob<vobname>

We recently changed mastership of a stream from one site (inh) to another(ies). Things were fine till following error.
Now a delivery from child branch to the "moved branch" results in error. Not all merges are problematic. Select directories (or I think so) are not merging.
Unable to perform operation "make branch" in replica "interfaces_src_ies" of VOB "\interfaces_src".
Master replica of branch type "project_subset_QPE-5060" is "interfaces_src.inh".
There is no candidate version which can be checked out.
Unable to check out "M:\dyn_project_subset\interfaces_src\src\java\src\chs\cof\project".
How can I fix this? How can I change mastership of "branch type "project_subset_QPE-5060 to interfaces_src.ies
That should mean, as detailed in the IBM technote swg21142784, that the mastership transfer was incomplete.
That can happen when there was a checked out file at the time of the transfer.
Make sure there is no checked out files (on both sites), and try and transfer the mastership again (even if it says it is already transferred)
Or, as described in the technote, try and create the branch on the other site, and create a synchronization packet from the mastering site using multitool syncreplica -export so the site where the element creation is going to happen receives the mkbranch operation.
You see that kind of operation in IBM technote swg21118471.
On Windows, this setting can also help preventing this situation:
cleardlg.exe/options/Operations tab/Advanced Options:
When creating an element in a replicated VOB,
make current replica the master of all newly created branches.
I also had this exact issue when trying to checkout a file to modify.
I was able to create a view, but when I tried to checkout a file it kept complaining about:
Error checking out '<file>'.
Unable to perform operation "make branch" in replica "<branch>" of VOB "<vob>".
Master replica of branch type "<type>" is "<X>"
Unable to check out "<file>"
This was fixed by changing the ClearCase Registry Server to the correct host, and then re-creating the View.

Resources