How can I batch reindex several elastic indexes? - elasticsearch

For example, I have quite large numbers of indexes named like:
logstash-oidmsgcn_2016.12.01
logstash-oidmsgcn_2016.12.02
logstash-oidmsgcn_2016.12.03
...
logstash-oidmsgcn_2017.02.21
need to be indexed to names:
bk-logstash-oidmsgcn_2016.12.01
bk-logstash-oidmsgcn_2016.12.02
bk-logstash-oidmsgcn_2016.12.03
...
bk-logstash-oidmsgcn_2017.02.21
so, I only need to give their names a prefix in a batch way.
what can I do to get this job done?
I have referenced to reindex api and bulk api, but I still cannot get the hang of its way.

You can only do this be reindexing all your indices. If you are open to do this, you can do it with the reindex API like this:
POST _reindex
{
"source": {
"index": "logstash-oidmsgcn_*"
},
"dest": {
"index": "bk-logstash-oidmsgcn"
},
"script": {
"inline": "ctx._index = 'bk-logstash-oidmsgcn_' + (ctx._index.substring('logstash-oidmsgcn_'.length(), ctx._index.length()))"
}
}
Note that you need to enable dynamic scripting in order for this to work.

Related

How to reindex and change _type

We need to migrate a number of indexes from ElasticSearch 6.8 to ElasticSearch 7.x. To be able to do this, we now need to go back and fix a large number of documents are the _type field of these documents aren't _doc as required. We fixed this for newer indexes, but some of the older data which we still need has other values in here.
How do we reindex these indexes and also change the _type field?
POST /_reindex
{
"source": {
"index": "my-index-2021-11"
},
"dest": {
"index": "my-index-2021-11-n"
},
"script": {
"source": "ctx._type = '_doc';"
}
}
I saw a post indicating the above might work, but on execution, the value for _type in the next index was still the existing of my-index.
The one option I can think of is to iterate through each document in the index and add it to the new index again which should create the correct _type, but that will take days to complete, so not so keen on doing that.
I think below should work . Please test it out, before running on actual data
{
"source": {
"index": "my-index-2021-11"
},
"dest": {
"index": "my-index-2021-11-n",
"type":"_doc"
}
}
Docs to help in upgradation
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/reindex-upgrade-inplace.html

elasticsearch reindex doesnt use index name specified in script

I tried reindexing daily indices from remote cluster and following reindex-daily-indices example
POST _reindex
{
"source": {
"remote": {
"host": "http://remote_es:9200"
},
"index": "telemetry-*"
},
"dest": {
"index": "dummy"
},
"script": {
"lang": "painless",
"source": """
ctx._index = 'telemetry-' + (ctx._index.substring('telemetry-'.length(), ctx._index.length()));
"""
}
}
It looks like if the new ctx._index is exactly the same as the original ctx._index, it will use the dest.index instead. It reindex all the records into "dummy" index
Is this a bug or intended behaviour? I could not find any explanation to this behaviour.
Is there a way to reindex (multiple indices) from remote and still preserve the original name?
It's because according to your logic, the destination index name is the same as the source index name. In the documentation you linked at, they are appending '-1' at the end of the index name.
In your case, the following logic just sets the same destination index name as the source index name, and reindex doesn't allow that, so it's using the destination index name specified in dest.index
ctx._index = 'telemetry-' + (ctx._index.substring('telemetry-'.length(), ctx._index.length()));
Also worth noting that this case has been reported here and here.

Adding default value to existing mapping in elastic search

I have an index with mapping. I decided to add a new field to existing mapping:
{
"properties": {
"sexifield": {
"type": "keyword",
"null_value": "NULL"
}
}
}
As far as I understand, the field should appear in existing documents when I reindex. So when I use api to reindex:
{
"source": {
"index": "index_v1"
},
"dest": {
"index": "index_v2",
"version_type": "external"
}
}
I see that the mapping for index_v2 does not consist sexifield, and documents are not consisting it neither. Also this operation took less than 60ms.
Please point me, what I do not understand from it...
Adding the new documents to the first index (via java API, for an entity which has not this field (sexifield), so probably elastic should add me the default one) with sexifield, also does not create me this additional field.
Thanks in advance for tips.
Regards
great question +1 ( I learned something while solving your problem)
I don't know the answer to how to consider the second mapping (reindexed mapping) while reindexing, but here is how I would update the reindexed index (all the documents) once the reindexing is done from original index. I still continue to research to see if there is a way to consider the default values that are defined in the mapping of the second index while reindexing, but for now see if this solution helps..
POST /index_v2/_update_by_query
{
"script": {
"lang": "painless",
"inline": "ctx._source.sexifield = params.null_value",
"params": {
"null_value": "NULL"
}
}
}

Best way to reindex multiple indices in ElasticSearch

I am using Elasticsearch 5.1.1 and have 500 + indices created with default mapping provided by ES.
Now we have decided to use dynamic templates.
In order to apply this template/mapping to old indices I need to reindex all indices.
What is the best way to do it? Can we use Kibana for this ? Couldn't find sufficient documentation to do so.
Example: Reindex from a daily index to a monthly index (August)
POST _reindex?slices=10&refresh
{
"source": {
"index": "myindex-2019.08.*"
},
"dest": {
"index": "myindex-2019.08"
}
}
Monitor reindex task (wait until is finished)
GET _tasks?detailed=true&actions=*reindex
Check if new index was created
GET _cat/indices/myindex-2019.08*?v&s=index
You can delete old indices
DELETE myindex-2019.08.*
Source:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
You can use the _reindex API which can also reindex multiple indices. It was specifically built for this.
Bash script to re-index all indices matching a pattern: https://gist.github.com/hartfordfive/e507bc47e17f4e03a89055918900e44d
If you want to filter some field and reindex it from index you can use this.
POST _reindex
{
"source": {
"index": "auditbeat",
"query": {
"match": {
"agent.version": "7.6.0"
}
}
},
"dest": {
"index":"auditbeat-7.6.0"
}
}

Reindex fail due to SearchContextMissingException

My company is using elasticsearch 2.3.4.
We have a cluster that contains 38 ES nodes, and we've been having a problem with reindexing some of our data lately...
We've reindexed before very large indexes and had no problems, but recently, when trying to reindex much smaller indexed (less than 10GB) - we get : "SearchContextMissingException [No search context found for id [XXX]]".
We have no idea what's causing this problem or how to fix it. We'd like some guidance.
Has anyone saw this exception before?
From github comments on issues related to this , i think this can be avoided by changing batch size :
From documentation:
By default _reindex uses scroll batches of 1000. You can change the batch size with the size field in the source element:
POST _reindex
{
"source": {
"index": "source",
"size": 100
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
I had the same problem with an index that holds many huge documents. I had to reduce the batch size down to 10. (100 and 50 both didn't work).
This was the request that worked in the end:
POST _reindex?slices=5&refresh
{
"source": {
"index": "source_index",
"size": 10
},
"dest": {
"index": "dest_index"
}
}
You should also set the slices to the number of shards you have in your index.

Resources