Elasticsearch rolling upgrade - do deleted documents get removed? - elasticsearch

I'm going to upgrade Elasticsearch from 5.6 to 6.8 following the rolling upgrade procedure.
I have an index which is 54,172,622 documents across 5 primary shards, with 1 replica of each. There are 21,696,332 deleted documents in the index.
When I follow the rolling upgrade procedure, will the procedure automatically purge the deleted documents, or is it better to reindex to a new index before upgrading? I assume the upgrade is slower if the deleted documents are included.

When I follow the rolling upgrade procedure, will the procedure automatically purge the deleted documents
No, upgrading will NOT modify your docs.count or docs.deleted. The counts will remain the same.
is it better to reindex to a new index before upgrading?
Just because you want to upgrade doesn't mean you need to re-index. It depends. If your index was created in versions prior to 5.x, then you might need to upgrade. The best way to determine this is to run the Upgrade Assistant tool in Kibana. You'll need to reindex some internal indices like .kibana, .security, .tasks, .watches and the Upgrade Assistant will help to reindex those indices. It will also tell you if your main index containing 54,172,622 docs needs to be re-indexed or not. Refer https://www.elastic.co/guide/en/elastic-stack/6.8/upgrading-elastic-stack.html
I assume the upgrade is slower if the deleted documents are included.
I don't think the value of docs.deleted impacts the upgrade process. It's just a count.

Related

How to check the index is used for searching or indexing

I've a lot of elasticsearch clusters which hold the historical indices(more than 10 years old), some of these indices are created newly with latest settings and fields, but old ones are not deleted.
Now I need to delete the old indices which are not receiving any search and index requests.
I've already gone to elasticsearch curator but it would not work with older version of ES.
Is there is any API which can just gives the last time of index and search request in ES, that would serve my purpose very well.
EDIT:- I've also check https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html but this also doesn't give the last time when indexing or search request came. all it gave is the number of these requests from last restart.

How to know an Index's ElasticSearch version on Disk?

How can someone know the ElasticSearch version of an Index on disk? I have a case where I'd like to know what version of ElasticSearch an index was created with so that I can perform some additional steps before taking on migration of the index to a newer ES version. Like perhaps explain to a user on upgrade -- "Hey, this might take a while, need to migrate your index." The assumption here is that ES is shutdown at this point and I cannot directly get the ES version from ElasticSearch. Additionally, there may be more than one index and therefor more than one version for that set of indexes... (not sure why that would be the case, but better to expect the worst).
Based on the Index data on disk, how can someone tell the version of ElasticSearch which produced that Index?

Documents in elasticsearch getting deleted automatically?

I'm creating an index though logstash and pushing data to it from a MySQL database. But what I noticed in elasticsearch was once the whole data is uploaded, it starts deleting some of the docs. The total number of docs is 160729. Without the scheduler it works fine.
I inserted the cron scheduler in order to check whether new rows have been added to the table. Can that be the issue?
My logstash conf looks like this.
Where am I going wrong? Or is this behavior common?
Any help could be appreciated.
The docs.deleted number doesn't mean that your documents are being deleted, but simply that existing documents are being "updated" and the older version of the updated document is marked as deleted in the process.
Those documents marked as deleted will be eventually cleaned up as Lucene merges segments in the background.

Elasticsearch : How to get all indices that ever existed

is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.

elasticsearch update document frequently

I am playing with ES. When one updates document in ES, ES automatically increments the version of the document.
While this is great, i wander if ES keeps the old documents too?
If it keeps the whole old documents, the storage on disk could grow a lot if I often update documents.
So in general , i am planning to do daily updates on all documents in some index. For 1 year i will have 365 updates on every document in one index. Is this OK to do ? Will i have 365 documents stored in ES ?
Is there a way to clean some old versions of the documents ?
No it does not keep old documents, it's just for optimistic locking (concurrent updates).
http://www.elasticsearch.org/blog/versioning/

Resources