I have 2TB of indices, trying to manually delete some indices removes them from Kibana, etc. I can delete it via curl or Kibana and it is acknowledged and removed. It is however not freeing up the space.
I went ahead and also removed the ILM from the index before deleting a few indices, still no luck.
Although I removed a whole index, also tried POST _forcemerge to no avail.
How can I recover space now that the indices are deleted?
For those who look at this later
Deleting a whole index should free up space instantly! Does not require _forcemerge, etc.
The issue here was the use of a ZFS file system which required a snapshot to be cleared to recover space.
Related
I'm using Elasticsearch 7.5.2 on Ubuntu. Recently, I began using Elasticsearch to display relevant search results on every page load. This shot up the volume, but I also found out that it has created large index files. Note that I'm using 'app-search' to power my queries.
Here's the sample index files that are occupying too much space:
.app-search-analytics-logs-loco_togo_production-7.1.0-2020.01.26 => 52 GB
.app-search-analytics-logs-loco_togo_production-7.1.0-2020.01.27 => 53 GB
I tried deleting these using CURL, but they reappear and show lesser space (~5 GB each).
I want to know if there is a way to control these indexes. I'm not sure what purpose do these indices solve and if there is a way to prevent them?
I tried deleting these using CURL, but they reappear and show lesser space (~5 GB each).
Obviously your delete-action was executed. It seems like that the indices still get written to. If documents still get into elasticsearch, the index gets re-created.
So for example:
The index from 2020.01.27 has 53 GB before the deletion. After you delete it, the data is gone and the index itself too. But as soon as new documents of the very same day (2020.01.27) get indexed, the index gets re-created containing the documents after the deletion which is probably the 5GB.
If this is not what you want, you need to check if there are some sources still sending data.
Hope this helps.
EDIT:
Q: However, is there a way to manage these indices? I don't want them to eat up too much space.
Yes! Index Lifecycle Management (ILM) is what you are looking for. It aims to automate the maintenance/management of indices. So for example you could define a rollover every 30GB to a new index in order to keep them small. Another example is to delete the index after X days. Take a look at all the phases and actions.
When I delete documents from Elasticsearch, why does my 'total size' stay the same despite obviously being far smaller with the absence of previously stored data?
I've read about index optimization but I'm not sure what this is or how to do it.
Thanks
I'm sure there are tons of questions relating to this on both SO and Google so this may be a duplicate answer. However - deleting documents only marks them as deleted, it doesn't actually remove them from your data store.
In old ES, there used to be a feature named 'optimize' (which is deprecated) - nowdays forcemerge is the enhanced replacement. The following command should free up the space you're entitled to.
curl -XPOST 'http://localhost:9200/_forcemerge?only_expunge_deletes=true'
Here's a bit more info on forcemerge if you're interested:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
When deleting records in ElasticSearch, I heard that the disk space is not freed up. So if I only wanted to keep rolling three months of documents in a type, how do I ensure that disk space is reused?
The system will naturally re-use the space freed up as it needs to, provided the files have been marked as such by ElasticSearch.
However, ElasticSearch goes through a series of stages Even 'retiring' the data will not remove it from the system, only hide it away.
This command should do what you need:
DELETE /
See here for more information: https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html
Most of the ElasticSearch documentation discusses working with the indexes through the REST API - is there any reason I can't simply move or delete index folders from the disk?
You can move data around on disk, to a point -
If Elasticsearch is running, it is never a good idea to move or delete the index
folders, because Elasticsearch will not know what happened to the data, and you
will get all kinds of FileNotFoundExceptions in the logs as well as indices
that are red until you manually delete them.
If Elasticsearch is not running, you can move index folders to another node (for
instance, if you were decomissioning a node permanently and needed to get the
data off), however, if the delete or move the folder to a place where
Elasticsearch cannot see it when the service is restarted, then Elasticsearch
will be unhappy. This is because Elasticsearch writes what is known as the
cluster state to disk, and in this cluster state the indices are recorded, so if
ES starts up and expects to find index "foo", but you have deleted the "foo"
index directory, the index will stay in a red state until it is deleted through
the REST API.
Because of this, I would recommend that if you want to move or delete individual
index folders from disk, that you use the REST API whenever possible, as it's
possible to get ES into an unhappy state if you delete a folder that it expects
to find an index in.
EDIT: I should mention that it's safe to copy (for backups) an indices folder,
from the perspective of Elasticsearch, because it doesn't modify the contents of
the folder. Sometimes people do this to perform backups outside of the snapshot
& restore API.
I use this procedure: I close, backup, then delete the indexes.
curl -XPOST "http://127.0.0.1:9200/*index_name*/_close"
After this point all index data is on disk and in a consistent state, and no writes are possible. I copy the directory where the index is stored and then delete it:
curl -XPOST "http://127.0.0.1:9200/*index_name*/_delete"
By closing the index, elasticsearch stop all access on the index. Then I send a command to delete the index (and all corresponding files on disk).
I'm new to ES, so the question can be somehow stupid, but:
I was experimenting with ES, creating index, putting some data there (1Mio records), and deleting it after and creating the same (with thу same name)
It seems that ES is not actually deleting the data in Index (via curl DELETE) as the disk space is not freed after all the deletes - for now 1Mio records seem to take 40Gb of disk space)
Is there any way to delete the deleted data totally so it will actually free space?
If its just for experimentation a quick dirty way would be to delete your data directory.
Another way to reclaim disk space is to run this command
curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'