How can I delete ES clusters?
Every time I start ES locally, it brings my indexes back to cluster state, which is now up to 33 and I believe taking up much of my RAM (8 GBs).
I only have 3 very small indexes, the biggest being just about 3 MBs.
Simply delete all the indices that you do not need. Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
You should delete the index that you want
For example, get your indices
curl -X GET http://127.0.0.1:9200/_cat/indices?v=
I got an index called web, for example, just delete it
curl -X DELETE http://127.0.0.1:9200/web
See more here
Related
I have index in production with 1 replica (this takes total ~ 1TB). Into this index every time coming new data (a lot of updates and creates).
When i have created the copy of this index - by running _reindex(with the same data and 1 replica as well) - the new index takes 600 GB.
Looks like there is a lot of junk and some kind of logs in original index which possible to cleanup. But not sure how to do it.
The questions: how to cleanup the index (without _reindex), why this is happening and how to prevent for it in the future?
Lucene segment files are immutable so when you delete or update (since it can't update doc in place) a document, old version is just marked deleted but not actually removed from disk. ES runs merge operation periodically to "defragment" the data but you can also trigger merge manually with _forcemerge (try running with only_expunge_deletes as well: it might be faster).
Also, make sure your shards are sized correctly and use ILM rollover to keep index size under control.
I'm using Elasticsearch 7.5.2 on Ubuntu. Recently, I began using Elasticsearch to display relevant search results on every page load. This shot up the volume, but I also found out that it has created large index files. Note that I'm using 'app-search' to power my queries.
Here's the sample index files that are occupying too much space:
.app-search-analytics-logs-loco_togo_production-7.1.0-2020.01.26 => 52 GB
.app-search-analytics-logs-loco_togo_production-7.1.0-2020.01.27 => 53 GB
I tried deleting these using CURL, but they reappear and show lesser space (~5 GB each).
I want to know if there is a way to control these indexes. I'm not sure what purpose do these indices solve and if there is a way to prevent them?
I tried deleting these using CURL, but they reappear and show lesser space (~5 GB each).
Obviously your delete-action was executed. It seems like that the indices still get written to. If documents still get into elasticsearch, the index gets re-created.
So for example:
The index from 2020.01.27 has 53 GB before the deletion. After you delete it, the data is gone and the index itself too. But as soon as new documents of the very same day (2020.01.27) get indexed, the index gets re-created containing the documents after the deletion which is probably the 5GB.
If this is not what you want, you need to check if there are some sources still sending data.
Hope this helps.
EDIT:
Q: However, is there a way to manage these indices? I don't want them to eat up too much space.
Yes! Index Lifecycle Management (ILM) is what you are looking for. It aims to automate the maintenance/management of indices. So for example you could define a rollover every 30GB to a new index in order to keep them small. Another example is to delete the index after X days. Take a look at all the phases and actions.
I have a Graylog 2.1 server that has been running for some time. I hadn't paid attention to my retention rate recently and came in this morning to find Graylog partially crashed because the disk was out of space. Nearly 100% of the disk space is currently being taken up by Elasticsearch Shards. The web interface for Graylog is not currently usable in the state it's in. I tried some of the standard Ubuntu tricks for freeing up disk space like apt-get autoremove and clean, but wasn't able to get enough to get the web interface functional.
The problem is all of the documentation I can currently find for changing the retention rate and cycling the shards, is via the web interface. The only config options no longer appear present in the Graylog config file.
Does anyone know of a manual, CLI, way of purging data from the Elasticsearch Shards in Graylog 2.1?
First aid: check which indices are present:
curl http://localhost:9200/_cat/indices
Then delete the oldest indices (you should not delete all)
curl -XDELETE http://localhost:9200/graylog_1
curl -XDELETE http://localhost:9200/graylog_2
curl -XDELETE http://localhost:9200/graylog_3
Fix: You can then reduce the parameter elasticsearch_max_number_of_indices in /etc/graylog/server/server.conf to a value that fits your disk.
If Elasticsearch is still starting, you can simply delete indices with the Delete Index API, which is, after using Graylog directly (System / Indices page in the web interface), the preferred way of getting rid of Elasticsearch indices.
If you're totally screwed (i. e. neither Graylog, nor Elasticsearch are starting), you can still delete the complete data from Elasticsearch's data path (see Directory Layout).
There is list of indexes under graylog admin panel,
"/system/indices"
There is delete button for each index. You can check old indexes and delete them if not required.
You can also delete log files older that 7 days from elastic search,
sudo find /var/log/elasticsearch/ -type f -mtime +7 -delete
You should set up a retention strategy from within graylog. If you manage the indices yourself and you delete the wrong index, you might break your graylog.
Go to system/indeces. Select default index set. Select edit index set and there you'll find index rotation and retention.
I have an ELK Stack set up and accepting log data from 2 of my applications and everything is working ok. Its been running for 25 days and I have nearly 4GB of Data/Documents on a 25GB server.
My question
I have 8 applications in total that I would like to hook up to my ELK Stack.
Is the one cluster OK for this, or do I need to add more clusters? say a cluster for each applications data? If so how do I do that without having to re-index my data?
Why does cluster health say "yellow (244 of 488)?
Should I index each application to index on it own index rather than the default "logstash-{todays-date}". Like my-app-1-{todays-date}, my-app-2-{todays-date} etc..?
your help is greatly appreciated
G
Your cluster is yellow because your logstash-* indices are configured with 1 replica and you probably have a single node. 244 of 488 means that you have 488 shards in all your indices but only 244 are assigned on your single node and 244 remain to be assigned to new nodes. This is not a problem per se, but if your current node were to fail for some reason, you'd probably lose some data, whereas if you had 2+ nodes, the data would be replicated on other nodes, your cluster would be green (and you'd see 488 of 488) and you'd have a lower risk of losing data.
As for your second question, nothing prevents you from storing all the logs from your eight applications in the same daily logstash indices. You just need to make sure that your logstash configuration accounts for every different apps and adds one field with the application name (e.g. app: app1, app: app2, etc) to the indexed log events so that you can then distinguish within Kibana from which app each log event has been issued.
I have only used Elasticsearch and no the complete ELK stack, but I can give some ideas and guess what is going on. 488 = 2 x 244 , so I guess there are un-assigned replica shards in the single-machine cluster. You can update this setting ad-hoc and set it to zero:
curl -XPUT 'localhost:9200/my_index/_settings' -d '
{"index" : {"number_of_replicas" : 0}}'
You should update logstash index template not to use replicas when you are running just a single machine. Also your shards seem to be only about 20 MB in size so I'd recommend each index to use just one shard instead of five, each shard consumes extra resources. Having multiple shards increases indexing speed but slows down queries, you should check if one is sufficient or not.
Index / application / day would speed-up querying if dashboards are mostly application-specific, and you can create a day-specific alias to-be used by cross-application queries.
On my elasticsearch server:
total documents: 3 million, total size: 3.6G
Then, I delete about 2.8 millions documents:
total documents: about 0.13 million, total size: 3.6G
I have deleted the documents, how should I free the size of the documents?
Deleting documents only flags these as deleted, so they would not be searched. To reclaim disk space, you have to optimize the index:
curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'
documentation: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html
The documentation has moved to:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html
Update
Starting with Elasticsearch 2.1.x, optimize is deprecated in favor of forcemerge.
The API is the same, only the endpoint did change.
curl -XPOST 'http://localhost:9200/_forcemerge?only_expunge_deletes=true'
In the current elasticsearch version(7.5),
To optimize all indices:
POST /_forcemerge?only_expunge_deletes=true
To optimize single index
POST /twitter/_forcemerge?only_expunge_deletes=true , where twitter is the index
To optimize several indices
POST /twitter,facebook/_forcemerge?only_expunge_deletes=true , where twitter and facebook are the indices
Reference: https://www.elastic.co/guide/en/elasticsearch/reference/7.5/indices-forcemerge.html#indices-forcemerge
knutwalker's answer is correct. However if you are using AWS ElasticSearch and want to free storage space, this will not quite work.
On AWS the index to forgemerge must be specified in the URL. It can include wildcards as is common with index rotation.
curl -XPOST 'https://something.es.amazonaws.com/index-*/_forcemerge?only_expunge_deletes=true'
AWS publishes a list of ElasticSearch API differences.
I just want to note that the 7.15 docs for the Force Merge API include this warning:
Force merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced, and if you continue to write to such an index then the automatic merge policy will never consider these segments for future merges until they mostly consist of deleted documents. This can cause very large segments to remain in the index which can result in increased disk usage and worse search performance.
So you should shut down writes to the index before beginning.
Replace indexname with yours. It will immediately free up space
curl -XPOST 'http://localhost:9200/indexname/_forcemerge' -d
'{"only_expunge_deletes": false, "max_num_segments": 1 }'