I was upgrading from Elasticsearch 7.10 to 8.4. I wanted to make a Filesystem snapshot, copy the data, install a new version and restore the data from the snapshot files I created earlier.
I have a setup with two node roles: master and data.
I didn't know that, in such a setup, when Elastic is making a Filesystem snapshot, it'll create a structure with raw indices on the data node, something like this:
indicies/
8wPAc89lSrqFunOTSkShSQ/
0/
__LHqdmaHLQU6WWpJVlqFY4w
index-AXVMDc2DQZyBZihEeGOM9g
snap-7Mv54vkoRjS9YLLgSaokDw.dat
...
I25vR794SZmFJ3TvjF3d-Q/
0/
__-f2Sb1onSlaj9XSAhc84LQ
index-sc-iDaI7TRGX0BKg7Mzk2w
snap-7Mv54vkoRjS9YLLgSaokDw.dat
and a structure with some metadata on the master node, like this:
index-0
index.latest
indicies/
I25vR794SZmFJ3TvjF3d-Q/
0/
meta-oHtfvYQBIjpWMF5xqR1L.dat
meta-7Mv54vkoRjS9YLLgSaokDw.dat
snap-7Mv54vkoRjS9YLLgSaokDw.dat
When I was copying the files, I only copied the ones from the data node (not knowing that Elasticsearch is also writing metadata information to the master node). So I now have raw indices data without metadata information for it.
I wanted to re-create some of the metadata (index-0 is a JSON with some mapping) by myself but there are also some encoded files for each snapshot so I assume they're probably some calculated control hashes and my approach might not work.
Is there a way to restore all these indices in Elasticsearch without the metadata information?
Unfortunately, I don't think it's possible to rebuild the metadata without knowing what all needs to go in there.
Also between 7.10 to 8.4 there has been significant changes in the index format and you will probably not be able to get 8.4 to read your 7.10 raw files without any issues.
Also when upgrading from 7.x to 8.4, you must first upgrade to 7.17 before upgrading to 8.4.
Related
How do people maintain all the changes done to the elasticsearch index over time so that if I have re-built the elasticsearch index from scratch to be same as the existing one, I can just do so in minutes. Do people maintain the logs of all PUT calls made over time to update the mappings and other settings?
I guess one way is to use snapshot ,It's a backup taken from a running Elasticsearch cluster or index. You can take a snapshot of individual index or of the entire cluster and store it in a repository on a shared filesystem. It contains a copy of the on-disk data structures and mappings that make up an index beside that when you create a snapshot of an index Elasticsearch will avoid copying any data that is already stored in the repository as part of an earlier snapshot so you can build or recover an index from scratch to last version of taken snapshot very quickly.
I'm using Elasticsearch 2.4.4 and need to use the snapshot/restore mechanism to backup data. But I have a few questions about it.
1. Can a snapshot be taken without any issues while data is being written into ES.
2. Does it matter which of master/data/client node is being used for taking snapshots.
3.Does restore require indices to be closed.If yes then why
Yes
Does not. Most important that storage where you want to write data is available to access from all cluster nodes. Because data are replicated via your cluster and I think you dont have control who will backup your data
No you can snapshot open indicies
Read a bit more about it and here
is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.
My application receives and parse thousands of small JSON snippets each about ~1Kb every hour. I want to create a backup of all incoming JSON snippets.
Is it a good idea to use Elasticsearch to backup this snippets in an index with f.ex. "number_of_replicas:" 4? Never read that anyone has used Elasticsearch for this.
Is my data safe in Elasticsearch when I use a cluster of servers and replicas or should I better use another storage for this use case?
(Writing it to the local file system isn't safe, as our hard discs crashes often. First I have thought about using HDFS, but this isn't made for small files.)
First you need to find difference between replica and backups.
replica is more than one copy of data at run time.It increases high availability and failover support,it wont support accidental delete of data.
Backup is copy of whole data at backup time.it will be used to restore when system crashed.
Elastic search for back up.. its not good idea.. Elastic search is a search engine not DB.If you have not configured ES cluster carefully,then you will end up with loss of data.
So in my opinion ,
To store json object, we got lot of dbs.. For example mongodb is a nosql db.We can easily configure it with more replicas.It means high availability of data and failover support.As you asked its also opensource and more reliable.
for more info about mongodb refer https://www.mongodb.org/
Update:
In elasticsearch if you create index with more shards it'll be distributed among nodes.If a node fails then the data will be lost.But in mongoDB more node means ,each mongodb node contains its own copy of data.If a mongodb fails then we can retrieve out data from replica mongodbs. We need to be more conscious about replica setup and shard allocation in Elasticsearch. But in mongoDB it's easier and good architecture too.
Note: I didn't say storing data in elasticsearch is not safe.I mean, comparing to mongodb,it's difficult to configure replica and maintain in elasticsearch.
Hope it helps..!
How can I get an elasticsearch index to a file and then insert that data to another cluster?
I want to move data from one cluster to another but I can't connect them directly.
If you no need to keep _id the same and only important bit is _source you may use logstash with config:
input { //from one cluster } output { //to another cluster }
here is more info: http://www.logstash.net/docs/1.4.2/
Yes it's method is weird, but I tried it for instant data transfer between clusters index by index and it is working as a charm (of course if you no need to keep _id generated by elasticsearch)
There is script which will help you to backup and restore indices from one cluster to another. i didn't tested this but may be it will fix your needs.
check this Backup and restore an Elastic search index
And you can also use perl script to copy index from one cluster to another (or the same cluster).
check this link clintongormley/ElasticSearch.pm
I recently tried my hands around this and there are a couple of approaches that can help you.
Use Elasticsearch's Snapshot and Restore APIs.
You can take a snapshot at the source cluster and use that snapshot to restore data to your destination cluster.
If your setup allows installing external packages, you can use Elasticdump as well.
HTH!