is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.
Related
We want to keep track of all the changes of a document, so we want to store all the document versions in separate index.
Is there a way when a new document is added or changes to send the entire document in another index? Maybe there is a processor for this use case?
As far as I know, Elasticsearch as such supports only version numbers but there is no way to trace back to previous version.
You could maintain version history in a seperate elastic index
Whenever you update main_index ensure that you update main_index as well
POST main_index/_doc/doc_id
POST main_index/_doc/doc_id_version
May be you can configure logstash to do this...not sure
I am facing a strange issue in the number of docs getting deleted in an elasticsearch index. The data is never deleted, only inserted and/or updated. While I can see that the total number of docs are increasing, I have also been seeing some non-zero values in the docs deleted column. I am unable to understand from where did this number come from.
I tried reading whether the update doc first deletes the doc and then re-indexes it so in this way the delete count gets increased. However, I could not get any information on this.
The command I type to check the index is:
curl -XGET localhost:9200/_cat/indices
The output I get is:
yellow open e0399e012222b9fe70ec7949d1cc354f17369f20 zcq1wToKRpOICKE9-cDnvg 5 1 21219975 4302430 64.3gb 64.3gb
Note: It is a single node elasticsearch.
I expect to know the reason behind deletion of docs.
You are correct that updates are the cause that you see a count for documents delete.
If we talk about lucene then there is nothing like update there. It can also be said that documents in lucene are immutable.
So how does elastic provides the feature of update?
It does so by making use of _source field. Therefore it is said that _source should be enabled to make use of elastic update feature. When using update api, elastic refers to the _source to get all the fields and their existing values and replace the value for only the fields sent in update request. It marks the existing document as deleted and index a new document with the updated _source.
What is the advantage of this if its not an actual update?
It removes the overhead from application to always compile the complete document even when a small subset of fields need to update. Rather than sending the full document, only the fields that need an update can be sent using update api. Rest is taken care by elastic.
It reduces some extra network round-trips, reduce payload size and also reduces the chances of version conflict.
You can read more how update works here.
How can someone know the ElasticSearch version of an Index on disk? I have a case where I'd like to know what version of ElasticSearch an index was created with so that I can perform some additional steps before taking on migration of the index to a newer ES version. Like perhaps explain to a user on upgrade -- "Hey, this might take a while, need to migrate your index." The assumption here is that ES is shutdown at this point and I cannot directly get the ES version from ElasticSearch. Additionally, there may be more than one index and therefor more than one version for that set of indexes... (not sure why that would be the case, but better to expect the worst).
Based on the Index data on disk, how can someone tell the version of ElasticSearch which produced that Index?
I would like to update an ElasticSearch Document while maintaining the document's version the same. I'm using version_type=external as indicated in the versioning section of the index_ documentation. Updating a document with another of the same version is normally prevented as indicated in that section: "If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail."
The reason I want to keep the version unaltered is because I do not create a new version of my object (stored in my database) when one adds new tags to that object, but I would like the new tags to show up in my ElasticSearch index. Is this possible with ElasticSearch?
I tried deleting the document and then adding a new document with the same Id and Version but that still gives me the following exception:
VersionConflictEngineException[[myindex][2] [mytype][6]: version
conflict, current 1, provided 1]
Just for reference, I'm using PHP Elastica (with methods $type->deleteDocument($doc); and $type->addDocument($doc);) but this question should apply to ElasticSearch in general.
The time for which elasticsearch keeps information about deleted documents is controlled by index.gc_deletes parameter. By default this time is 1m. So, theoretically, you can decrease this time to 0s, wait for a second, delete the document, index a new document with the same version, and set index.gc_deletes back to 1m. But at the moment that would work only on master due to a bug. If you are using older version of elasticsearch, you will not be able to change index.gc_deletes without closing the index first.
There is a good blog post on elasticsearch.org web site that describes how versions are handled by elasticsearch in details.
My understanding was that Elasticsearch would store the lastest copy of the document and just update the version field number? But I was playing around with a few thousand documents and had the need to index them repeatedly without changing any data in the document. My thinking was that the index size would remain the same, but that wasn't the case ... the index size seemed to increase.
This confused me a little bit, so i just wanted to seek clarification on the internal mechanism of versioning within elasticsearch.
An update is a Delete + Insert Lucene operation behind the scene.
But you should know that Lucene does not really delete the document but mark it as deleted.
To remove deleted docs, you have to optimize your Lucene segments.
$ curl -XPOST 'http://localhost:9200/twitter/_optimize?only_expunge_deletes=true'
See Optimize API. Also have a look at merge options. Merging segments happens behind the scene at some time.
For a general overview of versioning support in Elasticsearch, please refer to the Elasticsearch Versioning Support.