How to manually purge data from Graylog 2.1 - elasticsearch

I have a Graylog 2.1 server that has been running for some time. I hadn't paid attention to my retention rate recently and came in this morning to find Graylog partially crashed because the disk was out of space. Nearly 100% of the disk space is currently being taken up by Elasticsearch Shards. The web interface for Graylog is not currently usable in the state it's in. I tried some of the standard Ubuntu tricks for freeing up disk space like apt-get autoremove and clean, but wasn't able to get enough to get the web interface functional.
The problem is all of the documentation I can currently find for changing the retention rate and cycling the shards, is via the web interface. The only config options no longer appear present in the Graylog config file.
Does anyone know of a manual, CLI, way of purging data from the Elasticsearch Shards in Graylog 2.1?

First aid: check which indices are present:
curl http://localhost:9200/_cat/indices
Then delete the oldest indices (you should not delete all)
curl -XDELETE http://localhost:9200/graylog_1
curl -XDELETE http://localhost:9200/graylog_2
curl -XDELETE http://localhost:9200/graylog_3
Fix: You can then reduce the parameter elasticsearch_max_number_of_indices in /etc/graylog/server/server.conf to a value that fits your disk.

If Elasticsearch is still starting, you can simply delete indices with the Delete Index API, which is, after using Graylog directly (System / Indices page in the web interface), the preferred way of getting rid of Elasticsearch indices.
If you're totally screwed (i. e. neither Graylog, nor Elasticsearch are starting), you can still delete the complete data from Elasticsearch's data path (see Directory Layout).

There is list of indexes under graylog admin panel,
"/system/indices"
There is delete button for each index. You can check old indexes and delete them if not required.
You can also delete log files older that 7 days from elastic search,
sudo find /var/log/elasticsearch/ -type f -mtime +7 -delete

You should set up a retention strategy from within graylog. If you manage the indices yourself and you delete the wrong index, you might break your graylog.
Go to system/indeces. Select default index set. Select edit index set and there you'll find index rotation and retention.

Related

Elasticsearch count of searches against an index resets to zero after cluster restart

We use Elasticsearch - one cluster is 7.16 and another is 8.4. Behavior is the same in both.
We need to be able to get a count of search queries run against an index since the index's creation.
We retrieve the amount of searches that have been run against a given index by using the _stats endpoint as such:
GET /_stats?filter_path=indices.my_index.primaries.search.query_total
The problem is that this stat resets to zero after a cluster reboot. Does this data persist anywhere for a given index such that I can get the total since inception of the index? If not, is there an action I can take to somehow record that stat before a reboot so I can always access the full total number?
EDIT - this is the only item I was able to find on this subject, and the answer in this discussion does not look promising: https://discuss.elastic.co/t/why-close-reopen-index-will-reset-index-stats-to-zero/170830
As far as I know, there is no Out of the box solution to achieve your use-case, but its not that hard to build it yourself either, You can simply call the same _stats API periodically and store it in some other index of Elasticsearch or DB so that its not reset. IMHO Its not that big work.

How to properly delete AWS ElasticSearch index to free disk space

I am using AWS ElasticSearch, and publishing data to it from AWS Kinesis Firehose delivery stream.
In Kinesis Firehose settings I specified rotation period for ES index as 1 month. Every month Firehose will create new index for me appending month timestamp. As I understand, old index will be still presented, It wouldn’t be deleted.
Questions I have:
With new index being created each month with different name, do I need to recreate my Kibana dashboards each month?
Do I need to manually delete old index every month to clean disk space?
In order to clean disk space, is it enough just to run CURL command to delete the old index?
With new index being created each month with different name, do I need to recreate my Kibana dashboards each month?
No, you will need to create an index pattern on kibana, something like kinesis-*, then you will create your visualizations and dashboards using this index pattern.
Do I need to manually delete old index every month to clean disk space?
It depends of which version of Elasticsearch you are using, the last versions have a Index Lifecycle Management built-in in the Kibana UI, if your version does not have it you will need to do it manually or use curator, an elasticsearch python application to deal with theses tasks.
In order to clean disk space, is it enough just to run CURL command to delete the old index?
Yes, if you delete an index it will free the space used by that index.

Visualize Elasticsearch index size in Kibana

is it possible to show the size (physical size, e.g. MB) of one or more ES indices in Kibana?
Thanks
Kibana only:
It's not possible out of the box to view the disk-size of indices in Kibana.
Use the cat command to know how big your indices are (thats even possible without any Kibana).
If you need to view that data in Kibana index the output from the cat command to a dedicated Elasticsearch index and analyse it then in Kibana.
If other plugins/tools then Kibana are acceptable, read the following:
Check the Elasticsearch community plugins. The Head-Plugin (which I would recommand to you) gives you the info you want in addition to many other infos, like stats about your Shards, Nodes, etc...
Alternatively you could use the commerical Marvel Plugin from Elastic. I have never used it before, but it should be capeable of what you want, and much more. But Marvel is likely an overkill for what you want - so I wouldn't recommand that in the first place.
Although not a plugin of Kibana, cerebro is the official replacement of Kopf and runs as a standalone web server that can connect remotely to ElasticSearch instances. The UI is very informational and functional.
https://github.com/lmenezes/cerebro

Elasticsearch : How to get all indices that ever existed

is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.

Backing up, Deleting, Restoring Elasticsearch Indexes By Index Folder

Most of the ElasticSearch documentation discusses working with the indexes through the REST API - is there any reason I can't simply move or delete index folders from the disk?
You can move data around on disk, to a point -
If Elasticsearch is running, it is never a good idea to move or delete the index
folders, because Elasticsearch will not know what happened to the data, and you
will get all kinds of FileNotFoundExceptions in the logs as well as indices
that are red until you manually delete them.
If Elasticsearch is not running, you can move index folders to another node (for
instance, if you were decomissioning a node permanently and needed to get the
data off), however, if the delete or move the folder to a place where
Elasticsearch cannot see it when the service is restarted, then Elasticsearch
will be unhappy. This is because Elasticsearch writes what is known as the
cluster state to disk, and in this cluster state the indices are recorded, so if
ES starts up and expects to find index "foo", but you have deleted the "foo"
index directory, the index will stay in a red state until it is deleted through
the REST API.
Because of this, I would recommend that if you want to move or delete individual
index folders from disk, that you use the REST API whenever possible, as it's
possible to get ES into an unhappy state if you delete a folder that it expects
to find an index in.
EDIT: I should mention that it's safe to copy (for backups) an indices folder,
from the perspective of Elasticsearch, because it doesn't modify the contents of
the folder. Sometimes people do this to perform backups outside of the snapshot
& restore API.
I use this procedure: I close, backup, then delete the indexes.
curl -XPOST "http://127.0.0.1:9200/*index_name*/_close"
After this point all index data is on disk and in a consistent state, and no writes are possible. I copy the directory where the index is stored and then delete it:
curl -XPOST "http://127.0.0.1:9200/*index_name*/_delete"
By closing the index, elasticsearch stop all access on the index. Then I send a command to delete the index (and all corresponding files on disk).

Resources