Does upgrading Beats and index templates affect old data in elastic search? - elasticsearch

does updating beats from 6.x version to 7.x version and consequently updating elastic search index template, affects old data in elastic search?

No. Index templates are only applied at index creation and don't affect existing indices.

Related

Get results only over a specified min score in Elastic Search

I'm using ElasticSearch to index and search for documents. I want it to only return documents that are above a specific score.
Currently, using the 7.12 Version.
I found a way to specify the minimum score here in the official documentation but it is for an older version (6.8).
You can always change the ElasticSearch Guide version in the link you shared as per requirement.
https://www.elastic.co/guide/en/elasticsearch/reference/7.13/search-search.html#search-api-min-score
https://www.elastic.co/guide/en/elasticsearch/reference/7.x/search-search.html#search-api-min-score

Elasticsearch data comparison

I have two different Elasticsearch clusters,
One cluster is Elastcisearch 6.x with the data, Second new Elasticsearch cluster 7.7.1 with pre-created indexes.
I reindexed data from Elastcisearch 6.x to Elastcisearch 7.7.1
Is there any way to get the doc from source and compare it with the target doc, in order to check that data is there and it is not affected somehow.
When you perform a reindex the data will be indexed based on destination index mapping, so if your mapping is same you should get the same result in search, the _source value will be unique on both indices but it doesn't mean your search result will be the same. If you really want to be sure everything is OK you should check the inverted index generated by both indices and compare them for fulltext search, this data can be really big and there is not an easy way to retrieve it, you can check this for getting term-document matrix .

ElasticSearch 1.7 (Spring Data ElasticSearch) update by query takes lot of time to update documents

My application allows updating multiple elasticsearch documents in single request.
I use ElasticSearch BulkRequestBuilder to update all such documents in Bulk.
BulkRequestBuilder bulkRequestBuilder = elasticSearchClient.prepareBulk();
documents.forEach(id -> {
UpdateRequest updateRequest = new UpdateRequestBuilder(elasticSearchClient)
.setType("MyDocumentType")
.setIndex("MyDocumentIndex")
.setId(id)
.setDoc("fieldName", "valueToBeUpdated")
.request();
bulkRequestBuilder.add(updateRequest);
});
//update in bulk
bulkRequestBuilder.get();
All the documents are updated with valueToBeUpdated but ElasticSearch internally takes time to update all the documents but the call to bulkRequestBuilder.get() returns even before documents are updated. (Indicating Async nature of ElasticSearch engine).
Could anyone please suggest how to make it a Sync updates of all documents?
Finally I found the core issue (may be default nature) with updates taking time by the ElasticSearch engine.
By default the ElasticSearch engines updates are ASYNC in nature (as I pointed in my question already). There are couple of links which are explaining this default behaviour.
e.g. ElasticSearch GET API Documentation states that in order to get the document , elasticsearch engine does a refresh in order to visible all previous updates if any. This hints that ASYNC nature of elastic search is causing immediate search of my documents not providing me updated documents.
As of now to continue with existing behaviour, trigger bulk update in SYNC as follows.
bulkRequestBuilder.setReplicationType(ReplicationType.SYNC).setRefresh(true).get();
Usually problems indexing/updating a lot of data comes from segment merging from ES .
One tip from ES people is to disable refresh before indexing/updating a lot of data.
You can achieve this updating index refresh_interval before indexing to refresh_interval=-1, and once all your data is indexed return it to your previous index configuration.
Tune-indexing-speed

Why sometimes Elasticsearch scroll or search returns a set of doc ids which cannot be individually retrieved?

I am seeing a strange problem where Elasticsearch scroll or search API returns a set of documents which I cannot get by the ids any more. I am using Elassandra (Cassandra + ES) which is using Elasticsearch as secondary index store. There are TTL on the Cassandra records which are dropped due to TTL, but the ids are still there in Elasticsearch. Why is this strange behaviour? I did refresh and forcemerge of the corresponding index on Elasticsearch, but it didn't help.
Okay. I found the problem. The TTL field on Cassandra deletes the record on Cassandra, but the custom secondary index Elassandra built on Elasticsearch doesn't get deleted by that mechanism. In fact TTL is no longer there on higher version of ES. The documents need to be deleted explicitly from ES or we need to have time partioned Index on ES so that old indexes can be just deleted.

Elasticsearch upgrade doc_values enabled?

I've recently upgraded my Elastic cluster from 1.7.5 to 2.1.2.
I've read than in version 2+ Doc Values are enabled by default but I am wondering if this applies to the upgrade I have performed? I have checked my _mapping and _settings against the cluster but can't see any references to doc values.
If my understanding of how doc values work is correct, I was hoping this would go some way towards alleviating memory consumption issues on the cluster.
After your cluster upgrade to 2.1.2, you should perform an index upgrade of your old indices so that they get migrated to the new Lucene format.
All the new indices you will create in 2.1.2 will have doc values enabled by default, so there's nothing special to be done there.
However, all your old indices need to be upgraded first in order to leverage the Lucene format used in ES 2.1.2 . After that index upgrade, all your old indices will start using doc values for all existing fields (except analyzed strings of course), BUT all the already indexed data will not be back-filled into doc values files. For that, you'll need to reindex your data in order to use doc values for your existing data. All the new data coming into your old upgraded indices will be using doc values, though.

Resources