Elasticsearch upgrade doc_values enabled? - elasticsearch

I've recently upgraded my Elastic cluster from 1.7.5 to 2.1.2.
I've read than in version 2+ Doc Values are enabled by default but I am wondering if this applies to the upgrade I have performed? I have checked my _mapping and _settings against the cluster but can't see any references to doc values.
If my understanding of how doc values work is correct, I was hoping this would go some way towards alleviating memory consumption issues on the cluster.

After your cluster upgrade to 2.1.2, you should perform an index upgrade of your old indices so that they get migrated to the new Lucene format.
All the new indices you will create in 2.1.2 will have doc values enabled by default, so there's nothing special to be done there.
However, all your old indices need to be upgraded first in order to leverage the Lucene format used in ES 2.1.2 . After that index upgrade, all your old indices will start using doc values for all existing fields (except analyzed strings of course), BUT all the already indexed data will not be back-filled into doc values files. For that, you'll need to reindex your data in order to use doc values for your existing data. All the new data coming into your old upgraded indices will be using doc values, though.

Related

Different scores for identical documents after upgrading from spring-data-elasticsearch 4.2.1 to 4.3.0

I'm currently in the process of upgrading the spring boot version of my project. After upgrading from 2.5 to 2.6 a few tests started failing which deal with the retrival of elasticsearch documents. I'm trying to fetch only the highest scoring documents, but when expecting 2 identical documents, only 1 is retrieved.
After reading up on the issue I figured out that the problem comes down to the Elasticsearchindex using multiple shards, each having their own scoring logic and (probably?) the identical documents being fetched from different shards, thus resulting in different scores despite being virtually the same.
Now, can anyone tell me why this happens in the newer spring-data-elasticsearch version and if there is a setting to return it to the old functionality?
I've set up a little test project to play around with this. If anyone is interested in trying this for themselves, feel free to check it out: https://github.com/Moldavis/elasticsearch-scoring-poc
Actually found my own answer in the spring data breaking changes documentation (duh).
https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch-migration-guide-4.2-4.3.breaking-changes
search_type default value
The default value for the search_type in Elasticsearch is query_then_fetch. This now is also set as default value in the Query implementations, it was previously set to dfs_query_then_fetch.
The dfs_query_then_fetch option queries all shards for document and term frequency to equal out the score between different shards. This is no longer used by default, therefore the mentioned problem occurs.
It can be fixed by setting the searchtype for the query like so:
queryBuilder.withSearchType(SearchType.DFS_QUERY_THEN_FETCH);

Why sometimes Elasticsearch scroll or search returns a set of doc ids which cannot be individually retrieved?

I am seeing a strange problem where Elasticsearch scroll or search API returns a set of documents which I cannot get by the ids any more. I am using Elassandra (Cassandra + ES) which is using Elasticsearch as secondary index store. There are TTL on the Cassandra records which are dropped due to TTL, but the ids are still there in Elasticsearch. Why is this strange behaviour? I did refresh and forcemerge of the corresponding index on Elasticsearch, but it didn't help.
Okay. I found the problem. The TTL field on Cassandra deletes the record on Cassandra, but the custom secondary index Elassandra built on Elasticsearch doesn't get deleted by that mechanism. In fact TTL is no longer there on higher version of ES. The documents need to be deleted explicitly from ES or we need to have time partioned Index on ES so that old indexes can be just deleted.

How to know an Index's ElasticSearch version on Disk?

How can someone know the ElasticSearch version of an Index on disk? I have a case where I'd like to know what version of ElasticSearch an index was created with so that I can perform some additional steps before taking on migration of the index to a newer ES version. Like perhaps explain to a user on upgrade -- "Hey, this might take a while, need to migrate your index." The assumption here is that ES is shutdown at this point and I cannot directly get the ES version from ElasticSearch. Additionally, there may be more than one index and therefor more than one version for that set of indexes... (not sure why that would be the case, but better to expect the worst).
Based on the Index data on disk, how can someone tell the version of ElasticSearch which produced that Index?

Elasticsearch : How to get all indices that ever existed

is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.

Update ElasticSearch Document while maintaining its external version the same?

I would like to update an ElasticSearch Document while maintaining the document's version the same. I'm using version_type=external as indicated in the versioning section of the index_ documentation. Updating a document with another of the same version is normally prevented as indicated in that section: "If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail."
The reason I want to keep the version unaltered is because I do not create a new version of my object (stored in my database) when one adds new tags to that object, but I would like the new tags to show up in my ElasticSearch index. Is this possible with ElasticSearch?
I tried deleting the document and then adding a new document with the same Id and Version but that still gives me the following exception:
VersionConflictEngineException[[myindex][2] [mytype][6]: version
conflict, current 1, provided 1]
Just for reference, I'm using PHP Elastica (with methods $type->deleteDocument($doc); and $type->addDocument($doc);) but this question should apply to ElasticSearch in general.
The time for which elasticsearch keeps information about deleted documents is controlled by index.gc_deletes parameter. By default this time is 1m. So, theoretically, you can decrease this time to 0s, wait for a second, delete the document, index a new document with the same version, and set index.gc_deletes back to 1m. But at the moment that would work only on master due to a bug. If you are using older version of elasticsearch, you will not be able to change index.gc_deletes without closing the index first.
There is a good blog post on elasticsearch.org web site that describes how versions are handled by elasticsearch in details.

Resources