I have just spent the best part of 12 hours indexing 70 million documents into Elasticsearch (1.4) on a single node, single server setup on an EC2 Ubuntu 14.04 box. This completed successfully however before taking a snapshot of my server I thought it would be wise to rename the cluster to prevent it accidentally joining production boxes in the future, what a mistake that was! After renaming in the elasticsearch.yml file and restarting the ES service my indexes have disappeared.
I saw the data was still present in the data dir under the old cluster name, i tried stopping ES, moving the data manually in the filesystem and then starting the ES service again but still no luck. I then tried renaming back to the old cluster name, putting everything back in place and still nothing. The data is still there, all 44gb of it but I have no idea how to get this back. I have spent the past 2 hours searching and all i can seem to find is advice on how to restore from a snapshot which I don't have. Any advice would be hugely appreciated - I really hope I haven't lost a day's work. I will never rename a cluster again!
Thanks in advance.
I finally fixed this on my own: Stopped the cluster, deleted the nodes directory that had been created in the new cluster, copied my old nodes directort over being sure to respect the old structure exactly, chowned the folder to elasticsearch just in case, started up the cluster and breathed a huge sigh of relief to see 72 million documents!
Related
We have a running instance of elasticsearch 6.6 that has several indices, so I took a snapshot of the two indices that I am interested in. I set up a new dockerized single-node elasticsearch 6.6 instance, where I attempted to restore the snapshot by using curl. The indices were restored, but the 10 shards were all red. So, I deleted the two restored indices, and ran the operation again, but this time in Kibana. After this restore operation, with restoring from the SAME snapshot, the shards were now all green and my application that queries elasticsearch was working!
I apologize for not having the output, but I have left work for the week, so I can't yet post the specifics of my snapshotting and restoring. Do any of you have suggestions about what might have caused the restore via curl to appear to have worked, but the shards were all red? And why deleting and re-restoring via kibana had a better effect? I definitely set include_global_state to false when taking the snapshot. And, on monday, if it's not clear why this is happening, then I will post more specifics. Thanks in advance!
It appears that this was, simply, a permissions issue! I brought the container up with docker-compose, and then I invoked docker-compose exec my_elastic_container /bin/bash /scripts/import-data.sh. That script extracted the gzipped tar file that contained the elasticsearch snapshot from the other cluster. Well, doing docker-compose exec means that the action is being done by the container's root user, but the snapshot restore operation is being done by elasticsearch, which was started by the elasticsearch user. If I perform chown -R elasticsearch:root /backups/* after extracting the archive, and then make the call to restore the snapshot, things are working. I will do more thorough testing tomorrow, and edit this answer if I missed anything.
We have a 5 node, 16 shards ElasticSearch cluster across 5 servers, plus a routing server and a monitoring server
Scenario
A developer has accidentally deleted a number of documents from an index within the cluster. ES snapshots have not been set up, though through our VPS provider, each of the servers has regular server-wide backups, and we can spin up and down extra instances easily as necessary. What is the fastest way to restore the lost records?
there is no guarantee that those backups from a regular backup tool are useful, because they do not guarantee a consistent point-in-time snapshot.
You should not try to bring those backups into your production cluster, you can try to have a second cluster in your non-production environment and load those backups, but there is zero guarantee that this will work.
The fastest way would be reindexing from the original source I guess.
i've just started to learn elasticsearch and even if i'm reading the documentation and do understand some aspects i have a long way to go until i will at least at some degree feel comfortable using it. The problem i'm having is that i don't understand if trying to use and integrate elastic with a .net project , will i'll still be needing a DB (relational) to have all of my date ? or i will not need it anymore since with elastic i can create indexes (which from my understanding is sort of a DB all by itself) that are stored on nodes which make a cluster so basically my date resides on a cluster at the end.
Can you give me a simple explanation with a simple example? I'm trying to implement a search page , this is my use case and elastic seems to know it's way around with searching stuff.
Thank you all.
Data will be persisted physically in each node of the cluster under the /data folder.
Ideally you'll do regular backups.
As an example I suggest this (working on a cluster with one single node) :
Start elastic and put some data in it.
Backup then delete the /data folder. You can't find the data anymore.
Put the folder back and do a get. It works !
Well, I started piping data into ES until it ran itself out of memory and crashed. I run free and i see that all memory is entirely used up.
I want to delete some data out of it (old data) but i can't query against localhost:9200, it rejects the connection.
How to fix the fact that i can't delete out the old data?
If you want to go hardcore about it, you can always delete anything in your data folder:
> rm $ES_HOME/data/<clustername>
Note: replace <clustername> with your real cluster name (the default is elasticsearch)
Stop indexing. If it stabilizes itself after few minute then try deleting the data again. Restart the cluster.
If it's still stuck, stop the indexing and restart the cluster.
In any case, if the nodes went OOM they need to be restarted, as the state the JVM is in is unknown.
I have lot of data indexed in my elasticsearch.
I deleted elasticsearch folder and then extarct again fresh zip of elasticsearch and start the elasticsearch server.
I am surprised because after staring new elasticsearch server, I again found all old data and this problem persists again and again.
Can any please help me? I don't want to get all old data indexed in elasticsearch.
Regards
Given the cluster health response it's not a problem with multiple nodes running on the same cluster as suggested by Igor. I'd suggest you to check the java processes running. You could maybe have an elasticsearch hanging somewhere which keeps writing in that folder.