I am using the ELK stack for analyzing logs. So as per default configuration a new index by "logsatash-YYYY-MM-DD" is created by ES.
So if I have configured logstash to read like this:
/var/log/rsyslog/**/2014-12-0[1-7]/auditd.log
So it is reading old logs and the index name created will be "logstash-2015-03-20", so this index will have documents (logs) of previous dates.
My problem occurs when I have to delete indexes. If I have to keep only last one weeks data and purge the older indices. When I will delete index names except the last 7 days, I have no track which days logs are kept in which index name. Eg: 2014-12-07 date's logs may be kept in any of index named logstash-2015-03-19 or logstash-2015-03-20.
So how shall I delete indexes??
Log messages are stored into indexes based on the value of the #timestamp field (which uses UTC time). If your 2014-12-07 logs end up in 2015-03-19 this timestamp parsing isn't done correctly.
Correct the problem by adding a grok and/or date filter and your 2014-12-07 logs will end up in the logstash-2014.12.07 index and it'll be trivial to clean up old logs.
Related
I am using ILM (Index Lifecycle Management) of Elastic to Index my live data(Email recieved).
The policy is created to rollover to new index on every 30 days.
The Index template is : WikiEmail-*.
So, Index is getting created every 30 days named as : WikiEmail-000001 and so forth.
Now I have an requirement wherein I need to index historical data(Older Email from past few years).
How do I index the Older data in the monthly index fashion ?
IS there a way we can have cusotmied IndexName in ILM , so that the starting Index name is : WikiEmail-0000099.
In that case , I can index the older document by creating corresponding indices in the Warm Phase named as WikiEmail-0000098 ,WikiEmail-0000097 and likewise.
you will run into issues here as the ILM policy will look at the index creation date when it comes to retention. so your old data may actually be around for longer than more recent data
if you want to have this data accessible under the ILM read alias, then you should index the data into whatever named indices you want, then attach them to that read alias
the only caveat is you will need to manage retention manually for those indices
The elastic indexes are getting bigger and bigger and then some days the indexes are small. The days that indexes are small no machine is down; everything is the same as in the days the indexes are big.
I noticed that elasticsearch still store documents in the indexes from days before.
Is it possible that elastic pilling up the days before in the current day? How elastic stores the documents on indexes?
We had to decrease the days the indexes are stored since some days one index is 2x the size of another.
Thanks
#maryf there can be 2 possibilities here if there are date based indexes:
log shipper is not persisting the registry which contains info about which log files have ben harvested and upto what offset.
index is defined to use incorrect timestamp field for timelines.
In first case, whenever your log shipper restarts, it will start reading log files from beginning and you can see duplicate records in your index. While in the second case, logs are stored to the index based on the timestamp field being used. If the timestamp is from older date, it will be stored in older index matching the date.
I have configured an ELK Cluster with 5 nodes, one being master and the other slaves.
I index logs in the cluster once a day using logstash. I use a CronJOB (script) to copy
the log files to the configured logstash directory. I have also manually set a .sincedb path for logstash.
However, a tricky thing happens. Almost every 3 days, index seems to be loosing documents and deleting everything prior to certain dates. I haven't configured any ILM policy, nor there is any script performing delete by query or delete full index. Even when calling _cat/indices formatted to show the creation date of te index, I see that it has been created almost 2 weeks ago. However, the documents that should've been for 2 weeks aren't there anymore, and even today it only had documents from 3 days ago.
Does anyone know why could this behaviour be happening or what can trigger it ?
Quick question related to Logstash and Elasticsearch:
If I use the following output index option:
index => "myindex-%{+MM}"
Will this overwrite the oldest index once it reaches a year or just add to it?
Logstash will never delete existing indexes, so when the year ends, the oldest index will be used again.
Nothing will get overwritten AS LONG AS the document ids of new documents are always different than the ids of the documents already existing in that index.
I work on several log files that I process with logstash. I divide them into several documents (multiline) and then I extract the information I want.
The problem is that I find myself in the end with several documents where I have nothing interesting and that takes me up space.
Do you know a way to delete documents where there is no information extract by logstash ?
Thank you very much for your help !
In lower versions of ElasticSearch, when creating indexes you can specify a ttl field that indicates the expiry of a document in the index. You could set the ttl to a value of say 24 hours. Read more here
However ttl has been deprecated as of version 2.0 since its a clumsy way of removing stale data, personally, i create rolling indexes with logstash and have a cron job that simply drops the daily index at eod via curl.
Also refer to this article from ES
https://www.elastic.co/guide/en/elasticsearch/guide/current/retiring-data.html