Repair elasticsearch snapshot with missing meta and snap files - elasticsearch

I'm using elasticsearch 5.3.3.
A couple of years ago I created some index snapshots and uploaded to S3 as a backup.
I recently needed to restore this backup and noticed that my snapshot is missing the meta info.
Comparing to a working snapshot I see these missing files:
meta-*
snap-*
index-0
index.latest
I know the data must be there because of the size of the directory and some text I see when I open some files that I believe are lucene segments.
I'm trying to find a way to recover the data by rebuilding the index somehow but I can't find good info about this. I even tried some low level lucene functions on the segments but it seems the snapshot doesn't a a store segments* file for lucene to read.
Any idea how can I recover this data?
Some info:
Elasticsearch 5.3.3
Lucene 6.4.2
Single index snapshot
Hope there's a way to recover the data.
Thank you all.

Related

Archive old data from Elasticsearch to Google Cloud Storage

I have an elasticsearch server installed in Google Compute Instance. A huge amount of data is being ingested every minute and the underline disk fills up pretty quickly.
I understand we can increase the size of the disks but this would cost a lot for storing the long term data.
We need 90 days of data in the Elasticsearch server (Compute engine disk) and data older than 90 days (till 7 years) to be stored in Google Cloud Storage Buckets. The older data should be retrievable in case needed for later analysis.
One way I know is to take snapshots frequently and delete the indices older than 90 days from Elasticsearch server using Curator. This way I can keep the disks free and minimize the storage cost.
Is there any other way this can be done without manually automating the above-mentioned idea?
For example, something provided by Elasticsearch out of the box, that archives the data older than 90 days itself and keeps the data files in the disk, we can then manually move this file form the disk the Google Cloud Storage.
There is no other way around, to make backups of your data you need to use the snapshot/restore API, it is the only safe and reliable option available.
There is a plugin to use google cloud storage as a repository.
If you are using version 7.5+ and Kibana with the basic license, you can configure the Snapshot directly from the Kibana interface, if you are on an older version or do not have Kibana you will need to rely on Curator or a custom script running with a crontab scheduler.
While you can copy the data directory, you would need to stop your entire cluster everytime you want to copy the data, and to restore it you would also need to create a new cluster from scratch every time, this is a lot of work and not practical when you have something like the snapshot/restore API.
Look into Snapshot Lifecycle Management and Index Lifecycle Management. They are available with a Basic license.

Unable to restore elasticsearch snapshots from S3 using the repository-s3 plugin after nodes data directory got deleted from all nodes

I am using repository-s3 plugin for snapshot and restore with elasticsearch 7.5.1.
I created snapshot policies and took snapshot of specific indices, confirmed that they existed in my S3 bucket. Now due to some reason, I had to delete data from all my nodes manually, so I ran
rm -r /var/lib/elasticsearch/nodes/0/ for all the nodes in my cluster.
Now when I again go to the snapshot and restore tab in kibana, it doesn't show my old snapshots data and I am not able to restore my indexes even though they are present in my S3 bucket.
I need to restore the indexes and need help with the same.
By removing /var/lib/elasticsearch/nodes/0/ you also deleted your cluster state (i.e. the _state sub-folder that sits next to the indices sub-folder) which also happens to contain your repository definitions.
You should never delete data directly from the filesystem unless you know what you're doing.
If you need space, just DELETE * from Dev Tools, but don't venture into the file system.
What you can try to do now is to recreate your S3 repository anew with the exact same settings as before, that will restore the repository in your cluster state and you might be able to see your old snapshots.

ELK logstash cant create index in ES

after following this tuto (https://www.bmc.com/blogs/elasticsearch-logs-beats-logstash/) in order to use logstash to analyze some log files, my index was created fine at the first time, then I wanted to re-index new files with new filters and new repositories so I deleted via "curl XDELETE" the index and now when I restart logstash and filebeat, the index is not created anymore.. I dont see any errors while launching the components.
Do I need to delete something else in order to re-create my index?
Ok since my guess (see comments) was correct, here's the explanation:
To avoid that filebeat reads and publishes lines of a file over and over again, it uses a registry to store the current state of the harvester:
The registry file stores the state and location information that Filebeat uses to track where it was last reading.
As you stated, filebeat successfully harvested the files, sent the lines to logstash and logstash published the events to elasticsearch which created the desired index. Since filebeat updated its registry, no more lines had to be harvested and thus no events were published to logstash again, even when you deleted the index. When you inserted some new lines, filebeat reopened the harvester and published only the new lines (which came after the "registry checkpoint") to logstash.
The default location of the registry file is ${path.data}/registry (see Filebeat's Directory Layout Overview).
... maybe the curl api call is not the best solution to restart the index
This has nothing to do with deleting the index. Deleting the index happens inside elasticsearch. Filebeat has no clue about your actions in elasticsearch.
Q: Is there a way to re-create an index based on old logs?
Yes, there are some ways you should take into consideration:
You can use the reindex API which copies documents from one index to another. You can update the documents while reindexing them into the new index.
In contrast to the reindex you can use the update by query API to update documents that will remain in the original index.
Lastly you could of course delete the registry file. However this could cause data loss. But for development purposes I guess that's fine.
Hope I could help you.

Is there any way to restore Elasticsearch snapshots apart from using the Elasticsearch restore API?

my company wants to use an existing Elasticsearch snapshot repository (consisting of various hundreds of gigabytes) to obtain the original documents and store them elsewhere. I must state that the snapshots have been obtained using the Elasticsearch snapshot API.
My company is somehow reluctant to use Elasticsearch to restore the snapshots, as they fear that would involve creating a new Elasticsearch cluster that would consume considerable resources. So far, I have not seen any other way to restore the snapshots than to use Elasticsearch, but, given my company's insistence, I ask here: is there any other tool that I could use to restore said snapshots? Thank you in advance for any help resolving this issue.
What I would do in your shoes is to spin up a local cluster and restore the existing snapshot into it (here is the relevant Elastic documentation: Restoring to a different cluster). Then, from there, I would either export the data by using the Kibana Reporting plugin (https://www.elastic.co/what-is/kibana-reporting), or by writing a Logstash pipeline to export the data from the local cluster to - say - a CSV file.

Getting old indexed elasticsearch data

I have lot of data indexed in my elasticsearch.
I deleted elasticsearch folder and then extarct again fresh zip of elasticsearch and start the elasticsearch server.
I am surprised because after staring new elasticsearch server, I again found all old data and this problem persists again and again.
Can any please help me? I don't want to get all old data indexed in elasticsearch.
Regards
Given the cluster health response it's not a problem with multiple nodes running on the same cluster as suggested by Igor. I'd suggest you to check the java processes running. You could maybe have an elasticsearch hanging somewhere which keeps writing in that folder.

Resources