How to _reindex elasticsearch data to new mapping (from flat fields to objects)? - elasticsearch

I have an old index (elasticsearch index) has more than 20K objects, this index has fields
{
"title": "Test title",
"title_ar": "عنوان تجريبي",
"body": "<p>......</p>"
}
I want to _reindex them to convert all data to new mapping like this
{
"title_1": {
"en": "Test title",
"ar": "عنوان تجريبي"
},
"body": "<p>......</p>"
}
What is the best elasticsearch pipeline processor to make this conversion available in _reindex API?

I suggest to simply use the reindex API to do this:
POST _reindex
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
},
"script": {
"source": "ctx._source.title = [ 'en' : ctx._source.title, 'ar': ctx._source.title_ar]",
"lang": "painless"
}
}
If in your old_index index you have this:
{
"title": "Test title",
"title_ar": "عنوان تجريبي",
"body": "<p>......</p>"
}
In your new index, you'll have this:
{
"title": {
"en": "Test title",
"ar": "عنوان تجريبي"
},
"body": "<p>......</p>"
}

Related

How to update a text type field in Elasticsearch to a keyword field, where each word becomes a keyword in a list?

I’m looking to update a field in Elasticsearch from text to keyword type.
I’ve tried changing the type from text to keyword in the mapping and then reindexing, but with this method the entire text value is converted into one big keyword. For example, ‘limited time offer’ is converted into one keyword, rather than being broken up into something like ['limited', 'time', 'offer'].
Is it possible to change a text field into a list of keywords, rather than one big keyword? Also, is there a way to do this with only a mapping change and then reindexing?
You need create a new index and reindex using a pipeline to create a list words.
Pipeline
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"split": {
"field": "items",
"target_field": "new_list",
"separator": " ",
"preserve_trailing": true
}
}
]
},
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"items": "limited time offer"
}
}
]
}
Results
{
"docs": [
{
"doc": {
"_index": "index",
"_id": "id",
"_version": "-3",
"_source": {
"items": "limited time offer",
"new_list": [
"limited",
"time",
"offer"
]
},
"_ingest": {
"timestamp": "2022-11-11T14:49:15.9814242Z"
}
}
}
]
}
Steps
1 - Create a new index
2 - Create a pipeline
PUT _ingest/pipeline/split_words_field
{
"processors": [
{
"split": {
"field": "items",
"target_field": "new_list",
"separator": " ",
"preserve_trailing": true
}
}
]
}
3 - Reindex with pipeline
POST _reindex
{
"source": {
"index": "idx_01"
},
"dest": {
"index": "idx_02",
"pipeline": "split_words_field"
}
}
Example:
PUT _ingest/pipeline/split_words_field
{
"processors": [
{
"split": {
"field": "items",
"target_field": "new_list",
"separator": " ",
"preserve_trailing": true
}
}
]
}
POST idx_01/_doc
{
"items": "limited time offer"
}
POST _reindex
{
"source": {
"index": "idx_01"
},
"dest": {
"index": "idx_02",
"pipeline": "split_words_field"
}
}
GET idx_02/_search

Term aggregation on ElasticSearch join

I would like to perform an aggregation on a join relation using ElasticSearch 7.7.
I need to know how many children I have for each parent.
The only way that I found to solve my issue is to use script inside term aggregation, but my concern is about performance.
/my_index/_search
{
"size": 0,
"aggs": {
"total": {
"terms": {
"script": {
"lang": "painless",
"source": "params['_source']['my_join']['parent']"
}
}
},
"max_total": {
"max_bucket": {
"buckets_path": "total>_count"
}
}
}
}
Someone knows a more fast way to execute this aggregation avoiding the script?
If the join field wasn't a parent/child I could replace the term aggregation with:
"terms": { "field": "my_field" }
To give more context I add some information about mapping:
I'm using Elastic 7.7.
I also attach a mapping with some sample documents:
{
"mappings": {
"properties": {
"my_join": {
"relations": {
"other": "doc"
},
"type": "join"
},
"reader": {
"type": "keyword"
},
"name": {
"type": "text"
},
"content": {
"type": "text"
}
}
}
}
PUT example/_doc/1
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/2
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/3
{
"content": "abc",
"my_join": {
"name": "doc",
"parent": 1
}
}
PUT example/_doc/4
{
"content": "def",
"my_join": {
"name": "doc"
"parent": 2
}
}
PUT example/_doc/5
{
"content": "def",
"acl_join": {
"name": "doc"
"parent": 1
}
}

elasticsearch reindex. select nested fields

Is it possible to set particular nested fields for reindexing?
According to docs https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-filter-source, selected fields are array.
POST _reindex
{
"source": {
"index": "twitter",
"_source": ["user", "_doc"]
},
"dest": {
"index": "new_twitter"
}
}
For example, we need reindex only nested fields of user like "name" and "birthdate":
How could it be done? We need something like this:
POST _reindex
{
"source": {
"index": "twitter",
"_source": { "user": ["name", "birthdate"], "_doc"]
},
"dest": {
"index": "new_twitter"
}
}
POST _reindex
{
"source": {
"index": "twitter",
"_source": [ "user.name", "user. birthdate", "_doc"]
},
"dest": {
"index": "twitter_new"
}
}
}
You need to use . to refer them.

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

Elastic Search change index to a document

How can I change the _index to an existing document in Elastic Search?
Example:
1) I create an index:
PUT /customer?pretty
2) I add a document:
POST /customer/_doc?pretty
{
"name": "John Doe"
}
3) I create another index:
PUT /customer2?pretty
How Do I move the document created in step 2 into the new _index customer2?
POST _reindex
{
"source": {
"index": "customer",
"type": "_doc",
"query": {
"term": {
"_id": "fMn2OmcBEGEHUvm1g7Mi"
}
}
},
"dest": {
"index": "customer2"
}
}
DELETE /customer2/_doc/fMn2OmcBEGEHUvm1g7Mi
where "fMn2OmcBEGEHUvm1g7Mi" is the id of the document.
There isn't a way to edit the meta fields in a document. The best way would be to reindex it into a new index and delete the older index.
POST _reindex
{
"source": {
"index": "customer"
},
"dest": {
"index": "customer2"
}
}

Resources