Clear all deleted data in Elasticsearch - elasticsearch

I'm new to ES, so the question can be somehow stupid, but:
I was experimenting with ES, creating index, putting some data there (1Mio records), and deleting it after and creating the same (with thу same name)
It seems that ES is not actually deleting the data in Index (via curl DELETE) as the disk space is not freed after all the deletes - for now 1Mio records seem to take 40Gb of disk space)
Is there any way to delete the deleted data totally so it will actually free space?

If its just for experimentation a quick dirty way would be to delete your data directory.
Another way to reclaim disk space is to run this command
curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'

Related

Elasticsearch index is taking up too much disk space

I have index in production with 1 replica (this takes total ~ 1TB). Into this index every time coming new data (a lot of updates and creates).
When i have created the copy of this index - by running _reindex(with the same data and 1 replica as well) - the new index takes 600 GB.
Looks like there is a lot of junk and some kind of logs in original index which possible to cleanup. But not sure how to do it.
The questions: how to cleanup the index (without _reindex), why this is happening and how to prevent for it in the future?
Lucene segment files are immutable so when you delete or update (since it can't update doc in place) a document, old version is just marked deleted but not actually removed from disk. ES runs merge operation periodically to "defragment" the data but you can also trigger merge manually with _forcemerge (try running with only_expunge_deletes as well: it might be faster).
Also, make sure your shards are sized correctly and use ILM rollover to keep index size under control.

Manual delete of indices on elastic search not freeing up space

I have 2TB of indices, trying to manually delete some indices removes them from Kibana, etc. I can delete it via curl or Kibana and it is acknowledged and removed. It is however not freeing up the space.
I went ahead and also removed the ILM from the index before deleting a few indices, still no luck.
Although I removed a whole index, also tried POST _forcemerge to no avail.
How can I recover space now that the indices are deleted?
For those who look at this later
Deleting a whole index should free up space instantly! Does not require _forcemerge, etc.
The issue here was the use of a ZFS file system which required a snapshot to be cleared to recover space.

Reclaim disk space after deleting files in Elasticsearch

When I delete documents from Elasticsearch, why does my 'total size' stay the same despite obviously being far smaller with the absence of previously stored data?
I've read about index optimization but I'm not sure what this is or how to do it.
Thanks
I'm sure there are tons of questions relating to this on both SO and Google so this may be a duplicate answer. However - deleting documents only marks them as deleted, it doesn't actually remove them from your data store.
In old ES, there used to be a feature named 'optimize' (which is deprecated) - nowdays forcemerge is the enhanced replacement. The following command should free up the space you're entitled to.
curl -XPOST 'http://localhost:9200/_forcemerge?only_expunge_deletes=true'
Here's a bit more info on forcemerge if you're interested:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

How to free up unused space after deleting documents in ElasticSearch?

When deleting records in ElasticSearch, I heard that the disk space is not freed up. So if I only wanted to keep rolling three months of documents in a type, how do I ensure that disk space is reused?
The system will naturally re-use the space freed up as it needs to, provided the files have been marked as such by ElasticSearch.
However, ElasticSearch goes through a series of stages Even 'retiring' the data will not remove it from the system, only hide it away.
This command should do what you need:
DELETE /
See here for more information: https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html

Backing up, Deleting, Restoring Elasticsearch Indexes By Index Folder

Most of the ElasticSearch documentation discusses working with the indexes through the REST API - is there any reason I can't simply move or delete index folders from the disk?
You can move data around on disk, to a point -
If Elasticsearch is running, it is never a good idea to move or delete the index
folders, because Elasticsearch will not know what happened to the data, and you
will get all kinds of FileNotFoundExceptions in the logs as well as indices
that are red until you manually delete them.
If Elasticsearch is not running, you can move index folders to another node (for
instance, if you were decomissioning a node permanently and needed to get the
data off), however, if the delete or move the folder to a place where
Elasticsearch cannot see it when the service is restarted, then Elasticsearch
will be unhappy. This is because Elasticsearch writes what is known as the
cluster state to disk, and in this cluster state the indices are recorded, so if
ES starts up and expects to find index "foo", but you have deleted the "foo"
index directory, the index will stay in a red state until it is deleted through
the REST API.
Because of this, I would recommend that if you want to move or delete individual
index folders from disk, that you use the REST API whenever possible, as it's
possible to get ES into an unhappy state if you delete a folder that it expects
to find an index in.
EDIT: I should mention that it's safe to copy (for backups) an indices folder,
from the perspective of Elasticsearch, because it doesn't modify the contents of
the folder. Sometimes people do this to perform backups outside of the snapshot
& restore API.
I use this procedure: I close, backup, then delete the indexes.
curl -XPOST "http://127.0.0.1:9200/*index_name*/_close"
After this point all index data is on disk and in a consistent state, and no writes are possible. I copy the directory where the index is stored and then delete it:
curl -XPOST "http://127.0.0.1:9200/*index_name*/_delete"
By closing the index, elasticsearch stop all access on the index. Then I send a command to delete the index (and all corresponding files on disk).

Resources