In Elasticsearch, how to move data from one field into another field - elasticsearch

I have an index with mappings that look like this:
"mappings": {
"default": {
"_all": {
"enabled": false
},
"properties": {
"Foo": {
"properties": {
"Bar": {
"type": "keyword"
}
}
}
}
}
I am trying to change the mapping to introduce a sub-field of Bar, called Code, whilst migrating the string currently in Bar into Bar.Code. Here is the new mapping:
"mappings": {
"default": {
"_all": {
"enabled": false
},
"properties": {
"Foo": {
"properties": {
"Bar": {
"properties": {
"Code": {
"type": "keyword"
}
}
}
}
}
}
}
In order to do this, I think I need to do a _reindex and specify a pipeline. Is that correct? If so, how does my pipeline access the original data?
I have tried variations on the following code, but without success:
PUT _ingest/pipeline/transformFooBar
{
"processors": [
{
"set": {
"field": "Bar.Code",
"value": "{{_source.Bar}}"
}
}
]
}
POST _reindex
{
"source": {
"index": "foo_v1"
},
"dest": {
"index": "foo_v2",
"pipeline": "transformFooBar"
}
}

Ah, I almost had the syntax right. The _source is not required:
// Create a pipeline with a SET processor
PUT _ingest/pipeline/transformFooBar
{
"processors": [
{
"set": {
"field": "Bar.Code",
"value": "{{Bar}}"
}
}
]
}
// Reindex using the above pipeline
POST _reindex
{
"source": {
"index": "foo_v1"
},
"dest": {
"index": "foo_v2",
"pipeline": "transformFooBar"
}
}

Related

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

Nested query in ElasticSearch - two levels

I have the next mapping :
"c_index": {
"aliases": {},
"mappings": {
"an": {
"properties": {
"id": {
"type": "string"
},
"sm": {
"type": "nested",
"properties": {
"cr": {
"type": "nested",
"properties": {
"c": {
"type": "string"
},
"e": {
"type": "long"
},
"id": {
"type": "string"
},
"s": {
"type": "long"
}
}
},
"id": {
"type": "string"
}
}
}
}
}
}
And I need a query than gives me all the cr's when:
an.id == x and sm.id == y
I tried with :
{"query":{"bool":{"should":[{"terms": {"_id": ["x"]}},
{"nested":{"path": "sm","query":{
"match": {"sm.id":"y"}}}}]}}}
But runs very slow and gives more info than i need.
What's the most efficient way to do that ? Thank you!
You don't need nested query here. Also, use filter instead of should if you want to find documents matching all the queries (the exception would be if you wanted the query to affect the score, like match query, which is not the case here, then you could use should + minimum_should_match option)
{
"query": {
"bool": {
"filter": [
{ "term": { "_id": "x" } },
{ "term": { "sm.id": "y" } }
]
}
}
}

Elastic search range not working for double indexed as long

I have mapped a field as long, but the input data is decimal (100.123).
I've tried any range search and it doesn't work. I've verified and the data is in the proper index and I can find them if I search for missing/exists.
Range query:
"range": {
"nr_val": {
"from": 123,
"to": 1234
}
}
Is Elasticsearch just ignoring the values, treating them as strings in a range search ?
So in my situation, what can I do to make a range search from:100, to:200 work for 100.123 other than a full dump and re-import? Are there any conversion options available?
Update with detailed specs
{
"state": "open",
"settings": {
"index": {
"creation_date": "1447858537098",
"number_of_shards": "5",
"uuid": "iiPzQXasQadvnDF1da8oMw",
"version": {
"created": "1070299"
},
"number_of_replicas": "1"
}
},
"mappings": {
"mongo_doc": {
"properties": {
"parent": {
"type": "string"
},
"data.current.specs.nr._nrm_val": {
"type": "double"
},
"data.current.specs.nr_b._nrm_val": {
"type": "double"
},
"data": {
"properties": {
"current": {
"properties": {
"specs": {
"properties": {
"nr": {
"properties": {
"_nrm_val": {
"type": "double"
}
}
},
"nr_b": {
"properties": {
"_nrm_val": {
"type": "long"
}
}
}
}
}
}
}
}
}
}
}
},
"aliases": []
}
Seems that the mapping is not quite right... switched to ['data']['properties']['current']['properties'](...) notation.
In your case that field should have been double, not long. And the indexed value for 100.123 is 100 and you loose the decimals.
At this point, other than re-indexing which is ideal, probably just scripted filtering will do it:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source['nr'].value >= param1 && _source['nr'].value <= param2",
"params": {
"param1": 100,
"param2": 200
}
}
}
}
}
}
but it will be expensive because of the _source loading.

Elasticsearch Aggregation - Unable to perform aggregation to object

I have a mapping with an inner object as follows:
{
"mappings": {
"_all": {
"enabled": false
},
"properties": {
"foo": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"address": {
"type": "object",
"properties": {
"address": {
"type": "string"
},
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
When I try the following aggregation it does not return any data:
post data:*/foo/_search?search_type=count
{
"query": {
"match_all": {}
},
"aggs": {
"unique": {
"cardinality": {
"field": "address.city"
}
}
}
}
When I try to put field city or address.city, aggregation returns zero but if i put foo.address.city it is then when i get the correct respond by elasticsearch. This also affects kibana behavior
Any ideas why this is happening? I saw there is a mapping refactoring that might affects this. I use elasticsearch version 1.7.1
To add on this if, I use the relative path in a search query as follows it works normally:
"query": {
"filtered": {
"filter": {
"term": {
"address.city": "london"
}
}
}
}
Seems its this same issue.
This is seen when the type name and field name is same.

ElasticSearch Snowball Analyzer not working with nested query

I have created an index with the following mapping
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"anlayzer": "snowball"
},
"fr-FR": {
"type": "string",
"anlayzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}
I have put one record using
PUT http://localhost:9200/test1/searchText/1
{
"catalogue_product": {
"id": "18437",
"long_desc": {
"translation": {
"en-GB": "C120 - circuit breaker - C120H - 4P - 125A - B curve",
"fr-FR": "Disjoncteur C120H 4P 125A courbe B 15000A"
}
}
}
}
Then if i do a search for the word
breaker
inside
catalogue_product.long_desc.translation.en-GB
I get the added record
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breaker"
}
}
}
}
}
if replace the word
breaker
with
breakers
, I dont get any records in spite of the en-GB field having analyzer=snowball in the mapping
POST http://localhost:9200/test1/searchText/_search
{
"query": {
"nested": {
"path": "catalogue_product.long_desc.translation",
"query": {
"match": {
"catalogue_product.long_desc.translation.en-GB": "breakers"
}
}
}
}
}
I am going crazy with this. Where am I going wrong?
I tried a new mapping with analyzer as english instead of snowball, but that did not work either :(
Any help is appreciated
Dude , its a typo. Its analyzer and not anlayzer
PUT http://localhost:9200/test1
{
"mappings": {
"searchText": {
"properties": {
"catalogue_product": {
"type":"nested",
"properties": {
"id": {
"type": "string",
"index":"not_analyzed"
},
"long_desc": {
"type":"nested",
"properties": {
"translation": {
"type":"nested",
"properties": {
"en-GB": {
"type": "string",
"analyzer": "snowball"
},
"fr-FR": {
"type": "string",
"analyzer": "snowball"
}
}
}
}
}
}
}
}
}
}
}

Resources