Elasticsearch Out of Memory Crash -- How to Delete Data? - elasticsearch

Well, I started piping data into ES until it ran itself out of memory and crashed. I run free and i see that all memory is entirely used up.
I want to delete some data out of it (old data) but i can't query against localhost:9200, it rejects the connection.
How to fix the fact that i can't delete out the old data?

If you want to go hardcore about it, you can always delete anything in your data folder:
> rm $ES_HOME/data/<clustername>
Note: replace <clustername> with your real cluster name (the default is elasticsearch)

Stop indexing. If it stabilizes itself after few minute then try deleting the data again. Restart the cluster.
If it's still stuck, stop the indexing and restart the cluster.
In any case, if the nodes went OOM they need to be restarted, as the state the JVM is in is unknown.

Related

Creating snapshot from multi-node elasticsearch cluster, restoring on single-node, shards red

We have a running instance of elasticsearch 6.6 that has several indices, so I took a snapshot of the two indices that I am interested in. I set up a new dockerized single-node elasticsearch 6.6 instance, where I attempted to restore the snapshot by using curl. The indices were restored, but the 10 shards were all red. So, I deleted the two restored indices, and ran the operation again, but this time in Kibana. After this restore operation, with restoring from the SAME snapshot, the shards were now all green and my application that queries elasticsearch was working!
I apologize for not having the output, but I have left work for the week, so I can't yet post the specifics of my snapshotting and restoring. Do any of you have suggestions about what might have caused the restore via curl to appear to have worked, but the shards were all red? And why deleting and re-restoring via kibana had a better effect? I definitely set include_global_state to false when taking the snapshot. And, on monday, if it's not clear why this is happening, then I will post more specifics. Thanks in advance!
It appears that this was, simply, a permissions issue! I brought the container up with docker-compose, and then I invoked docker-compose exec my_elastic_container /bin/bash /scripts/import-data.sh. That script extracted the gzipped tar file that contained the elasticsearch snapshot from the other cluster. Well, doing docker-compose exec means that the action is being done by the container's root user, but the snapshot restore operation is being done by elasticsearch, which was started by the elasticsearch user. If I perform chown -R elasticsearch:root /backups/* after extracting the archive, and then make the call to restore the snapshot, things are working. I will do more thorough testing tomorrow, and edit this answer if I missed anything.

Manually start HDFS every time I boot?

Laconically: Should I start HDFS every that I come back to the cluster after a power-off operation?
I have successfully created a Hadoop cluster (after loosing some battles) and now I want to be very careful on proceeding with this.
Should I execute start-dfs.sh every time I power on the cluster, or it's ready to execute my application's code? Same for start-yarn.sh.
I am afraid that if I run it without everything being fine, it might leave garbage directories after execution.
Just from playing around with the Hortonworks and Cloudera sandboxes, I can say turning them on and off doesn't seem to demonstrate any "side-effects".
However, it is necessary to start the needed services everytime the cluster starts.
As far as power cycling goes in a real cluster, it is recommended to stop the services running on the respective nodes before powering them down (stop-dfs.sh and stop-yarn.sh). That way there are no weird problems and any errors on the way to stopping the services will be properly logged on each node.

dncp_block_verification log file increases size in HDFS

We are using cloudera CDH 5.3. I am facing a problem wherein the size of "/dfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr" and "dncp-vlock-verification.log.prev" keeps increasing to TBs within hours. I read in some of the blogs and they mention it is an HDFS bug. A temporary solution to this problem is to stop the datanode services and delete these files. But we have observed that the log file increases in size on either of the datanodes (even on the same node after deleting it). Thus, it requires continuous monitoring.
Does anyone have a permanent solution to this problem?
One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS DataNode configuration the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.
Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, which includes the HDFS-7430 rewrite changes and associated bug fixes. These changes have done away with the use of such a local file, thereby removing the problem.

How to recover data from a renamed Elasticsearch cluster?

I have just spent the best part of 12 hours indexing 70 million documents into Elasticsearch (1.4) on a single node, single server setup on an EC2 Ubuntu 14.04 box. This completed successfully however before taking a snapshot of my server I thought it would be wise to rename the cluster to prevent it accidentally joining production boxes in the future, what a mistake that was! After renaming in the elasticsearch.yml file and restarting the ES service my indexes have disappeared.
I saw the data was still present in the data dir under the old cluster name, i tried stopping ES, moving the data manually in the filesystem and then starting the ES service again but still no luck. I then tried renaming back to the old cluster name, putting everything back in place and still nothing. The data is still there, all 44gb of it but I have no idea how to get this back. I have spent the past 2 hours searching and all i can seem to find is advice on how to restore from a snapshot which I don't have. Any advice would be hugely appreciated - I really hope I haven't lost a day's work. I will never rename a cluster again!
Thanks in advance.
I finally fixed this on my own: Stopped the cluster, deleted the nodes directory that had been created in the new cluster, copied my old nodes directort over being sure to respect the old structure exactly, chowned the folder to elasticsearch just in case, started up the cluster and breathed a huge sigh of relief to see 72 million documents!

Cassandra Snapshot and Restart

Being a level 1 novice in Linux (Ubuntu 9), shell and cron, I've had some difficulty figuring this out. Each night, I'd like to take a snapshot of our Cassandra nodes and restart the process.
Why? Because our team is hunting down a memory leak that requires a process restart every 3 weeks or so. The root cause has been difficult to track down. In the meantime, I'd like to put these cron jobs in place to reduce service interruption.
Thanks in advance for anyone who has some of these already figured out!
The general procedure is:
Run nodetool drain (http://www.riptano.com/docs/0.6/utilities/nodetool#nodetool-drain) on the node
Run nodetool snapshot
Kill the cassandra process
Start the cassandra process
When running nodetool snapshot, it is very important that you have JNA set up and working. This includes:
Having jna.jar in Cassandra's lib directory and either:
Running Cassandra as root, or
Increasing the memory locking limit using 'ulimit -l' or something like /etc/security/limits.conf
If this is all correct, you should see a message about "mlockall" succeeding in the logs on startup.
The other thing to keep an eye on is your disk space usage; this will grow as compactions occur and the old SSTables are replaced (but their snapshots remain).

Resources