How can someone know the ElasticSearch version of an Index on disk? I have a case where I'd like to know what version of ElasticSearch an index was created with so that I can perform some additional steps before taking on migration of the index to a newer ES version. Like perhaps explain to a user on upgrade -- "Hey, this might take a while, need to migrate your index." The assumption here is that ES is shutdown at this point and I cannot directly get the ES version from ElasticSearch. Additionally, there may be more than one index and therefor more than one version for that set of indexes... (not sure why that would be the case, but better to expect the worst).
Based on the Index data on disk, how can someone tell the version of ElasticSearch which produced that Index?
Related
I'm going to upgrade Elasticsearch from 5.6 to 6.8 following the rolling upgrade procedure.
I have an index which is 54,172,622 documents across 5 primary shards, with 1 replica of each. There are 21,696,332 deleted documents in the index.
When I follow the rolling upgrade procedure, will the procedure automatically purge the deleted documents, or is it better to reindex to a new index before upgrading? I assume the upgrade is slower if the deleted documents are included.
When I follow the rolling upgrade procedure, will the procedure automatically purge the deleted documents
No, upgrading will NOT modify your docs.count or docs.deleted. The counts will remain the same.
is it better to reindex to a new index before upgrading?
Just because you want to upgrade doesn't mean you need to re-index. It depends. If your index was created in versions prior to 5.x, then you might need to upgrade. The best way to determine this is to run the Upgrade Assistant tool in Kibana. You'll need to reindex some internal indices like .kibana, .security, .tasks, .watches and the Upgrade Assistant will help to reindex those indices. It will also tell you if your main index containing 54,172,622 docs needs to be re-indexed or not. Refer https://www.elastic.co/guide/en/elastic-stack/6.8/upgrading-elastic-stack.html
I assume the upgrade is slower if the deleted documents are included.
I don't think the value of docs.deleted impacts the upgrade process. It's just a count.
We want to keep track of all the changes of a document, so we want to store all the document versions in separate index.
Is there a way when a new document is added or changes to send the entire document in another index? Maybe there is a processor for this use case?
As far as I know, Elasticsearch as such supports only version numbers but there is no way to trace back to previous version.
You could maintain version history in a seperate elastic index
Whenever you update main_index ensure that you update main_index as well
POST main_index/_doc/doc_id
POST main_index/_doc/doc_id_version
May be you can configure logstash to do this...not sure
I've recently upgraded my Elastic cluster from 1.7.5 to 2.1.2.
I've read than in version 2+ Doc Values are enabled by default but I am wondering if this applies to the upgrade I have performed? I have checked my _mapping and _settings against the cluster but can't see any references to doc values.
If my understanding of how doc values work is correct, I was hoping this would go some way towards alleviating memory consumption issues on the cluster.
After your cluster upgrade to 2.1.2, you should perform an index upgrade of your old indices so that they get migrated to the new Lucene format.
All the new indices you will create in 2.1.2 will have doc values enabled by default, so there's nothing special to be done there.
However, all your old indices need to be upgraded first in order to leverage the Lucene format used in ES 2.1.2 . After that index upgrade, all your old indices will start using doc values for all existing fields (except analyzed strings of course), BUT all the already indexed data will not be back-filled into doc values files. For that, you'll need to reindex your data in order to use doc values for your existing data. All the new data coming into your old upgraded indices will be using doc values, though.
is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.
We are thinking about implementing some sort of message cache which would hold onto the messages we send to our search index so we could persist while the index was down for an extended period of time (for example a complete re-index) then 're-apply' the messages. These messages are creations or updates of the documents we index. If space were cheap enough, with something as scalable as Couchbase we may even be able to hold all messages but I haven't done any sort of estimations of message size and quantity yet. Anyway, I suggested Couchbase + XDCR + Elasticsearch for this task as most of the work would be done automatically however there are 4 questions I have remaining:
If we were implementing this as a cache, I would not want Elasticsearch to remove any documents that were not in Couchbase, is this possible to do (perhaps it is even the default behaviour)?
Is it possible to apply some sort of versioning so that a document in the index is not over-written by an older version coming from Couchbase?
If I were to add a new field to the index, I might need to re-index from the actual document datasource then re-apply all the messages stored in Couchbase. I may have 100 million documents in Elasticsearch and say 500,000 documents in Couchbase that I want to re-apply to Elasticsearch? What would the speed be like.
Would I be able to apply any sort of logic in-between Couchbase and Elasticsearch?
Update:
So we store documents in an RDBMS as we need instant access to inserted docs plus some other stuff. We send limited versions of the document to a search engine via messages. If we want to add a field to the index we need to re-index the system from the RDBMS somehow. If we have this Couchbase message cache we could add the field to messages first, then switch off the indexing of old messages and re-index from the RDBMS. We could then switch back on the indexing of the messages and the entire 'queue' of messages would be indexed without having lost anything.
This system (if it worked) would remove the need for an MQ server, a message listener and make sure no documents were missing from the index.
The versioning would be necessary as we don't want to apply an 'update' to the index which actually contains a more recent document (not sure if this would ever happen now I think about it).
I appreciate it's probably not too great a job to implement points 1 and 4 by changing the Elasticsearch plugin code but I would like to confirm that the idea is reasonable first!
The Couchbase-Elasticsearch integration today should be seen as an indexing engine for Couchbase. This means the index is "managed/controlled" by the data that are in Couchbase.
The XDCR is used to sent "all the events" to Elasticsearch. This means the index is update/delete every time a document (stored in Couchbase) is created, modified or deleted.
So "all the documents" stored into a Couchbase bucket are indexed into Elasticsearch.
Let's answer your questions one by one, based on the current implementation of the Couchbase-Elasticsearch.
When a document is removed from Couchbase, the Elasticsearch index is update (entry removed).
Not sure to understand the question. How an "older" version could come from Couchbase? Anyway once again everytime the document that is stored into Couchbase is modified, the index in Elasticsearch is updated.
Not sure to understand where you want to add a new field? If this is into a document that is stored into Couchbase, when the document will be sent to Elasticsearch the index will be updated. But based on what I have said before : all document "stored" into Couchbase will be present in Elasticsearch index.
Not with the plugin as it is today, but as you know it is an open source project so you can either add some logic to it or even contribute your ideas to the project ( https://github.com/couchbaselabs/elasticsearch-transport-couchbase )
So let me ask you more questions:
- how do you inser the document into you application? (and where Couchbase? Elasticsearch?)
- what are the types of documents?
- what do you want to cache into Couchbase?