Reindex and alias - avoid duplicate search results - elasticsearch

I am using one alias for search with one index index_1 with is_write_index set to true. Due to mapping changes I need to reindex and this is my indexing process.
Create a new index index_2 with the new mapping
Reindex index_1 to index_2
Add the alias used in index_1 to index_2 with is_write_index set to true for index_2
Reindex index_1 to index_2 to sync the latest changes
Delete index_1
The issue I am having is that from step 3, queries to the alias is returning duplicate results. How to avoid this issue ?

Found the answer, the short term solution I found is to use filters when setting aliases to filter out duplicate results from index_2. So for example.
POST /_aliases
{
"actions": [
{
"add": {
"index": "index_1",
"alias": "aliasName",
"is_write_index": true
}
},
{
"add": {
"index": "index_2",
"alias": "aliasName",
"filter": {
"term": {
"myGuaranteedToExistField": "impossibleToFindValue"
}
}
}
}
]
}
When indexing is done and things are verified I can flip the indices. Not sure if this is the best solution though but it works.

Related

What is the best way to update cache in elasticsearch

I'm using elasticsearch index as a cache table.
My document structure is the following:
{
"mappings": {
"dynamic": False,
"properties": {
"query_str": {"type": "text"},
"search_results": {
"type": "object",
"enabled": false
},
"query_embedding": {
"type": "dense_vector",
"dims": 768,
},
}
}
The cache search is performed via embedding vector similarity. So if the embedding of the new query is close enough to a cached one, it is considered as a cache hit, and search_results field is returned to the user.
The problem is that I need to update cached results about once an hour. I wish my service won't lose the ability to use cache efficiently while updating procedure, so I'm not sure which one of solutions is the best:
Sequentially update documents one-by-one, so the index won't be destroyed. The drawback of this solution I afraid is the fact, that every update causes index rebuilding, so the cache requests will become slow
Create entirely new index with new results and then somehow swap current cache index with the new one. The drawbacks I see are
a) I've found no elegant way to swap indexes
b) Users will get their cached resuts lately than in solution (1)
I would go with #2 as everytime you update a document the cache is flushed.
There is an elegant way to swap indices:
You have an alias that points to your current index, you fill a new index with the fresh records, and then you point this alias to the new index.
Something like this:
Current index name is items-2022-11-26-001
Create alias items pointing to items-2022-11-26-001
POST _aliases
{
"actions": [
{
"add": {
"index": "items-2022-11-26-001",
"alias": "items"
}
}
]
}
Create new index with fresh data items-2022-11-26-002
When it finishes, now point the items alias to items-2022-11-26-002
POST _aliases
{
"actions": [
{
"remove": {
"index": "items-2022-11-26-001",
"alias": "items"
}
},
{
"add": {
"index": "items-2022-11-26-002",
"alias": "items"
}
}
]
}
Delete items-2022-11-26-001
You run all your queries against "items" alias that will act as an index.
References:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

How to reindex and change _type

We need to migrate a number of indexes from ElasticSearch 6.8 to ElasticSearch 7.x. To be able to do this, we now need to go back and fix a large number of documents are the _type field of these documents aren't _doc as required. We fixed this for newer indexes, but some of the older data which we still need has other values in here.
How do we reindex these indexes and also change the _type field?
POST /_reindex
{
"source": {
"index": "my-index-2021-11"
},
"dest": {
"index": "my-index-2021-11-n"
},
"script": {
"source": "ctx._type = '_doc';"
}
}
I saw a post indicating the above might work, but on execution, the value for _type in the next index was still the existing of my-index.
The one option I can think of is to iterate through each document in the index and add it to the new index again which should create the correct _type, but that will take days to complete, so not so keen on doing that.
I think below should work . Please test it out, before running on actual data
{
"source": {
"index": "my-index-2021-11"
},
"dest": {
"index": "my-index-2021-11-n",
"type":"_doc"
}
}
Docs to help in upgradation
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/reindex-upgrade-inplace.html

Adding default value to existing mapping in elastic search

I have an index with mapping. I decided to add a new field to existing mapping:
{
"properties": {
"sexifield": {
"type": "keyword",
"null_value": "NULL"
}
}
}
As far as I understand, the field should appear in existing documents when I reindex. So when I use api to reindex:
{
"source": {
"index": "index_v1"
},
"dest": {
"index": "index_v2",
"version_type": "external"
}
}
I see that the mapping for index_v2 does not consist sexifield, and documents are not consisting it neither. Also this operation took less than 60ms.
Please point me, what I do not understand from it...
Adding the new documents to the first index (via java API, for an entity which has not this field (sexifield), so probably elastic should add me the default one) with sexifield, also does not create me this additional field.
Thanks in advance for tips.
Regards
great question +1 ( I learned something while solving your problem)
I don't know the answer to how to consider the second mapping (reindexed mapping) while reindexing, but here is how I would update the reindexed index (all the documents) once the reindexing is done from original index. I still continue to research to see if there is a way to consider the default values that are defined in the mapping of the second index while reindexing, but for now see if this solution helps..
POST /index_v2/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.sexifield = params.null_value",
"params": {
"null_value": "NULL"
}
}
}

Best way to reindex multiple indices in ElasticSearch

I am using Elasticsearch 5.1.1 and have 500 + indices created with default mapping provided by ES.
Now we have decided to use dynamic templates.
In order to apply this template/mapping to old indices I need to reindex all indices.
What is the best way to do it? Can we use Kibana for this ? Couldn't find sufficient documentation to do so.
Example: Reindex from a daily index to a monthly index (August)
POST _reindex?slices=10&refresh
{
"source": {
"index": "myindex-2019.08.*"
},
"dest": {
"index": "myindex-2019.08"
}
}
Monitor reindex task (wait until is finished)
GET _tasks?detailed=true&actions=*reindex
Check if new index was created
GET _cat/indices/myindex-2019.08*?v&s=index
You can delete old indices
DELETE myindex-2019.08.*
Source:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
You can use the _reindex API which can also reindex multiple indices. It was specifically built for this.
Bash script to re-index all indices matching a pattern: https://gist.github.com/hartfordfive/e507bc47e17f4e03a89055918900e44d
If you want to filter some field and reindex it from index you can use this.
POST _reindex
{
"source": {
"index": "auditbeat",
"query": {
"match": {
"agent.version": "7.6.0"
}
}
},
"dest": {
"index":"auditbeat-7.6.0"
}
}

How do I remove a mapping on all indices including .kibana and .marvel?

I am new to Elasticsearch for .NET (NEST) and didn't specify the index when adding a mapping. Now the mapping exists on my indices for Kibana & Marvel.
How do I undo what I've done? I'm using Elasticsearch 2.* and can't delete the mapping. They say to just reindex, but I'm not sure how to do that for these indices.
".kibana": {
"mappings": {
"company": {
"properties": {
"iD": {
"type": "double",
"precision_step": 1
}
}
}
}
},
Unfortunately, you can't.
The only way to remove a mapping is to recreate the index without that mapping. The impact of that mapping (as goofy as it is) is low.

Resources