ElasticSearch deletes documents in an index automatically - elasticsearch

I have configured an ELK Cluster with 5 nodes, one being master and the other slaves.
I index logs in the cluster once a day using logstash. I use a CronJOB (script) to copy
the log files to the configured logstash directory. I have also manually set a .sincedb path for logstash.
However, a tricky thing happens. Almost every 3 days, index seems to be loosing documents and deleting everything prior to certain dates. I haven't configured any ILM policy, nor there is any script performing delete by query or delete full index. Even when calling _cat/indices formatted to show the creation date of te index, I see that it has been created almost 2 weeks ago. However, the documents that should've been for 2 weeks aren't there anymore, and even today it only had documents from 3 days ago.
Does anyone know why could this behaviour be happening or what can trigger it ?

Related

Elasticsearch snapshot how it works

i want to understand how snapshot works in elasticsearch
case1
snapshots are taken every day and snapshots older than 1 month are deleted
I have an index cities and for example there are 3 documents
{ barcelona, ​​madrid, urumqi} and, for example, I deleted the barcelona document from the index, it turns out that if a month passes and the last snapshot in which this index was deleted, then I can no longer recover this document?
case2
I have an elasticsearch cluster and a fairly large number of indexes, the rotation is 3 months, if, for example, a couple of indexes change or all are deleted, then if I restore from a snapshot that was taken 3 months ago, will my cluster be fully restored 3 on months ago data? will snapshot process rewrite all data or not?
if you delete the snapshots that cover an index then you cannot recover any of the data in the index. so no, you cannot recover the document
a restore will restore the data from the time the snapshot is taken. which means yes, the full data from 3 months will be what you see

Elasticsearch delete documents from index

I have an Elasticsearch cluster on Kubernetes, I also have a curator that deletes indices older than 7 days.
I want to change the curator to work according to a certain condition:
If document key1=value1 delete these documents delete after 10 days, otherwise delete after 7 days.
Is there any way to do it?
Curator is limited to index deletion as a whole and not at the document level.
What Curator does under the hood is call DELETE index-name and there is not way to configure it to call the delete by query API which is what you're asking for.

How to check the index is used for searching or indexing

I've a lot of elasticsearch clusters which hold the historical indices(more than 10 years old), some of these indices are created newly with latest settings and fields, but old ones are not deleted.
Now I need to delete the old indices which are not receiving any search and index requests.
I've already gone to elasticsearch curator but it would not work with older version of ES.
Is there is any API which can just gives the last time of index and search request in ES, that would serve my purpose very well.
EDIT:- I've also check https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html but this also doesn't give the last time when indexing or search request came. all it gave is the number of these requests from last restart.

Documents in elasticsearch getting deleted automatically?

I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
My logstash conf looks like this.
Where am I going wrong? Or is this behavior common?
Any help could be appreciated.
The docs.deleted number doesn't mean that your documents are being deleted, but simply that existing documents are being "updated" and the older version of the updated document is marked as deleted in the process.
Those documents marked as deleted will be eventually cleaned up as Lucene merges segments in the background.

Last updated time for an index in Elasticsearch

I have a use case where I ran a batch code to first create and then subsequently update my index in elasticsearch.
My program crashed pre-maturedly and now I want to know what was the last time that an update was made to my elasticsearch index.
Is there any api which could give me the information for the last update time of the index.
I have not been able to find any such resources. I looked specifically in https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html
and tried,
curl http://{myhost}/{indexName}/_stats

Resources