Elasticsearch multi data path disk full - elasticsearch

I have a 500gb hard drive divided between os and elasticsearch data, being almost full I added a second 1Tb hard drive and added it as a second drive in the elasticsearch.yml
(ex: file.data: /els, /var/lib/elasticsearch).
Through kiabna I can see that now the space is actually 1.5TB but every time I send a file this is saved in the usual hdd left the 1TB empty.
can someone help me?
version of elasticsearch 6.6.1

If you send document to an old index, than that old index is not moved to the new path and stays on the old hdd. Using multiple paths, elasticsearch does not relocate shards where there is some space.
For further information, see the following docs:
The path.data settings can be set to multiple paths, in which case all paths will be used to store data (although the files belonging to a single shard will all be stored on the same data path):
If you want to extend the current path, using new hdd, you can use something like Logical Volume Management. It is an abstraction of drives, so you can attach many real disk drives to a single logical drive.

Related

Is there some way to get elasticsearch index data from RAM?

I have 60gb text file and I want to search by text field in it. My plan is to put the file into Elasticsearch and setup search there.
But it might be that searching in text file would by quicker if reading file from RAM.
So the question is: Is there some way to read Elasticsearch index from RAM and search in RAM. It helps me to compare speed of searching into Elasticsearch and searching into text file(json,.pickle of other format).
I tried to read from the .pickle file using python.
The version of Elasticsearch is 7.1.
No, it is not. In the first versions of ES (see https://www.elastic.co/guide/en/elasticsearch/reference/1.4/index-modules-store.html) it was possible, but not anymore. You should rely on ES to cache the contents that are more frequently used, but there is nothing you can do to tell it to store contents in memory.

Does Elasticsearch make a copy of my data?

I copied 1TB of data to a cloud server, then ran Elasticsearch on that folder. Things seemed to index great. However, I noticed that hard disk space went from 33% used to 90% used. So it seems Elastic must have copied the source directory? Can I now delete that 1TB of original data from that machine?
If you run GET _stats/?human you'll see lots of details from your cluster, like how much storage you are using or how many documents you have added. If you have all the data you want in your cluster and it's correctly structured, you can delete the original data. Elasticsearch has its own copy.
BTW by default you will get 1 replica if you have more than 1 node; so 1 primary and 1 replica copy of the data. If you have a single node there will only be the primary one.

How to free up unused space after deleting documents in ElasticSearch?

When deleting records in ElasticSearch, I heard that the disk space is not freed up. So if I only wanted to keep rolling three months of documents in a type, how do I ensure that disk space is reused?
The system will naturally re-use the space freed up as it needs to, provided the files have been marked as such by ElasticSearch.
However, ElasticSearch goes through a series of stages Even 'retiring' the data will not remove it from the system, only hide it away.
This command should do what you need:
DELETE /
See here for more information: https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html

Backing up, Deleting, Restoring Elasticsearch Indexes By Index Folder

Most of the ElasticSearch documentation discusses working with the indexes through the REST API - is there any reason I can't simply move or delete index folders from the disk?
You can move data around on disk, to a point -
If Elasticsearch is running, it is never a good idea to move or delete the index
folders, because Elasticsearch will not know what happened to the data, and you
will get all kinds of FileNotFoundExceptions in the logs as well as indices
that are red until you manually delete them.
If Elasticsearch is not running, you can move index folders to another node (for
instance, if you were decomissioning a node permanently and needed to get the
data off), however, if the delete or move the folder to a place where
Elasticsearch cannot see it when the service is restarted, then Elasticsearch
will be unhappy. This is because Elasticsearch writes what is known as the
cluster state to disk, and in this cluster state the indices are recorded, so if
ES starts up and expects to find index "foo", but you have deleted the "foo"
index directory, the index will stay in a red state until it is deleted through
the REST API.
Because of this, I would recommend that if you want to move or delete individual
index folders from disk, that you use the REST API whenever possible, as it's
possible to get ES into an unhappy state if you delete a folder that it expects
to find an index in.
EDIT: I should mention that it's safe to copy (for backups) an indices folder,
from the perspective of Elasticsearch, because it doesn't modify the contents of
the folder. Sometimes people do this to perform backups outside of the snapshot
& restore API.
I use this procedure: I close, backup, then delete the indexes.
curl -XPOST "http://127.0.0.1:9200/*index_name*/_close"
After this point all index data is on disk and in a consistent state, and no writes are possible. I copy the directory where the index is stored and then delete it:
curl -XPOST "http://127.0.0.1:9200/*index_name*/_delete"
By closing the index, elasticsearch stop all access on the index. Then I send a command to delete the index (and all corresponding files on disk).

Moving an elasticsearch index of one node in a machine to another drive of the same machine

I have an elasticsearch node in a machine with a 150gb ssd and a 3 tb hdd. Since I am running out of space in the ssd, I would like to move one index from the ssd to the hdd. Is this possible? If so how?
I could create another node on the hdd, but I'd rather have one node in the machine...
Thanks!
You can safely move the data directory (and individual indexes and even shards) around. We've scp'd entire indexes around in this manner.
You probably should not actively index or delete when you are doing this though, or unpredictable things could happen.
Once you do the move, you just need to tell elasticsearch where to find data directory. You set this in the elasticsearch config file found in /etc/elasticsearch
Just add this setting:
path:
logs: /path/to/log/files
data: /path/to/data/directory
You might want to cp and not mv, just in case things don't go as planed.

Resources