Elasticsearch remove a field from an object of an array in a dynamically generated index - elasticsearch

I'm trying to delete fields from an object of an array in Elasticsearch. The index has been dynamically generated.
This is the mapping:
{
"mapping": {
"_doc": {
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"result": {
"properties": {
"resultid": {
"type": "long"
},
"resultname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
}
}
this is a document:
{
"result": [
{
"resultid": 69,
"resultname": "SFO"
},
{
"resultid": 151,
"resultname": "NYC"
}
],
"age": 54,
"name": "Jorge",
"timestamp": "2020-04-02T16:07:47.292000"
}
My goals is to remove all the fields resultid in result in all the document of the index. After update the document should look like this:
{
"result": [
{
"resultname": "SFO"
},
{
"resultname": "NYC"
}
],
"age": 54,
"name": "Jorge",
"timestamp": "2020-04-02T16:07:47.292000"
}
I tried using the following articles on stackoverflow but with no luck:
Remove elements/objects From Array in ElasticSearch Followed by Matching Query
remove objects from array that satisfying the condition in elastic search with javascript api
Delete nested array in elasticsearch
Removing objects from nested fields in ElasticSearch
Hopefully someone can help me find a solution.

You should reindex your index in a new one with _reindex API and call a script to remove your fields :
POST _reindex
{
"source": {
"index": "my-index"
},
"dest": {
"index": "my-index-reindex"
},
"script": {
"source": """
for (int i=0;i<ctx._source.result.length;i++) {
ctx._source.result[i].remove("resultid")
}
"""
}
}
After you can delete your first index :
DELETE my-index
And reindex it :
POST _reindex
{
"source": {
"index": "my-index-reindex"
},
"dest": {
"index": "my-index"
}
}

I combined the answer from Luc E with some of my own knowledge in order to reach a solution without reindexing.
POST INDEXNAME/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed
{
"script": {
"source": "for (int i=0;i<ctx._source.result.length;i++) { ctx._source.result[i].remove(\"resultid\")}"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "result.id"
}
}
]
}
}
}
Thanks again Luc!

If your array has more than one copy of element you want to remove. Use this:
ctx._source.some_array.removeIf(tag -> tag == params['c'])

Related

Update field in a document based on the condition in Kibana/Elasticsearch

I am trying to update particular field in document based on some condition. In general sql way, I want to do following.
Update index indexname
set name = "XXXXXX"
where source: file and name : "YYYYYY"
I am using below to update all the documents but I am not able to add any condition.
POST indexname/_update_by_query
{
"query": {
"term": {
"name": "XXXXX"
}
}
}
Here is the template, I am using:
{
"indexname": {
"mappings": {
"idxname123": {
"_all": {
"enabled": false
},
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"date1": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Could someone guide me how to add condition to it as mentioned above for the source and name.
Thanks,
Babu
You can make use of the below query to what you are looking for. I'm assuming name and source are your fields in your index.
POST <your_index_name>/_update_by_query
{
"script": {
"inline": "ctx._source.name = 'XXXXX'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"term": {
"name": {
"value": "YYYYY"
}
}
},
{
"term": {
"source": {
"value": "file"
}
}
}
]
}
}
}
You can probably make use of any of the Full Text Queries or Term Queries inside the Bool Query for either searching/updating/deletions.
Do spend sometime in going through them.
Note: Make use of Term Queries only if your field's datatype is keyword
Hope this helps!

elasticsearch reindex nested object's element to keyword

I have an index structured like below:
"my_index": {
"mappings": {
"my_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeStatistics": {
"type": "nested",
"properties": {
"clicks": {
"type": "long"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
}
}
I need to remove the nested object in a new index and just save the creativeId as a new keyword (to make it clear: I know I will loose the clicks data, and it is not important). It means the final new index scheme would be:
"my_new_index": {
"mappings": {
"my_new_index": {
"properties": {
"adId": {
"type": "keyword"
},
"name": {
"type": "keyword"
},
"title": {
"type": "keyword"
},
"creativeId": {
"type": "keyword"
}
}
}
}
}
Right now each row has exactly one creativeStatistics. and therefore there is no complexity in selecting one of the creativeIds.
I know it is possible to reindex using painless scripts, but I don't know how can I do that. Any help will be appreciated.
You can do it like this:
POST _reindex
{
"source": {
"index": "my_old_index"
},
"dest": {
"index": "my_new_index"
},
"script": {
"source": "if (ctx._source.creativeStatistics != null && ctx._source.creativeStatistics.size() > 0) {ctx._source.creativeId = ctx._source.creativeStatistics[0].creativeId; ctx._source.remove('creativeStatistics')}",
"lang": "painless"
}
}
You can also create a Pipeline by creating a Script Processor as follows:
PUT _ingest/pipeline/my_pipeline
{
"description" : "My pipeline",
"processors" : [
{ "script" : {
"source": "for (item in ctx.creativeStatistics) { if(item.creativeId!=null) {ctx.creativeId = item.creativeId;} }"
}
},
{
"remove": {
"field": "creativeStatistics"
}
}
]
}
Note that if you have multiple nested objects, it would append the last object's creativeId. And it would only add creativeId if a source document has one in its creativeStatistics.
Below is how you can then use reindex query:
POST _reindex
{
"source": {
"index": "creativeindex_src"
},
"dest": {
"index": "creativeindex_dest",
"pipeline": "my_pipeline"
}
}

How can I get parent with all children in one query

I have following mapping:
PUT /test_products
{
"mappings": {
"_doc": {
"properties": {
"type": {
"type": "keyword"
},
"name": {
"type": "text"
},
"entity_id": {
"type": "integer"
},
"weighted": {
"type": "integer"
}
"product_relation": {
"type": "join",
"relations": {
"window": "simple"
}
}
}
}
}
}
I want to get "window" products with all "simple"s but only where one or more "simple"s have property "weighted" = 1
I wrote following query:
GET test_products/_search
{
"query": {
"has_child": {
"type": "simple",
"query": {
"term": {
"weighted": 1
}
},
"inner_hits": {}
}
}
}
But I've got "window"s with "simple"s which are match to the term. In other words I want to filter "window"s list by "simple"'s option and get all matched "window"s with all their "simple"s. Is it possible without "nested" in one query? Or I have to do some queries?
OK. Luckily, I need to get only one "window" product with all it's children by it's ID, so I found parent_id query which can helps me with this task.
Now I have following query:
GET test_products/_search
{
"query": {
"parent_id": {
"type": "simple",
"id": "window-1"
}
}
}
Unfortunately, I have to execute 2 queries (has_child and then parent_id) instead of one but it's OK for me.

Nested query in ElasticSearch - two levels

I have the next mapping :
"c_index": {
"aliases": {},
"mappings": {
"an": {
"properties": {
"id": {
"type": "string"
},
"sm": {
"type": "nested",
"properties": {
"cr": {
"type": "nested",
"properties": {
"c": {
"type": "string"
},
"e": {
"type": "long"
},
"id": {
"type": "string"
},
"s": {
"type": "long"
}
}
},
"id": {
"type": "string"
}
}
}
}
}
}
And I need a query than gives me all the cr's when:
an.id == x and sm.id == y
I tried with :
{"query":{"bool":{"should":[{"terms": {"_id": ["x"]}},
{"nested":{"path": "sm","query":{
"match": {"sm.id":"y"}}}}]}}}
But runs very slow and gives more info than i need.
What's the most efficient way to do that ? Thank you!
You don't need nested query here. Also, use filter instead of should if you want to find documents matching all the queries (the exception would be if you wanted the query to affect the score, like match query, which is not the case here, then you could use should + minimum_should_match option)
{
"query": {
"bool": {
"filter": [
{ "term": { "_id": "x" } },
{ "term": { "sm.id": "y" } }
]
}
}
}

Elastic search range not working for double indexed as long

I have mapped a field as long, but the input data is decimal (100.123).
I've tried any range search and it doesn't work. I've verified and the data is in the proper index and I can find them if I search for missing/exists.
Range query:
"range": {
"nr_val": {
"from": 123,
"to": 1234
}
}
Is Elasticsearch just ignoring the values, treating them as strings in a range search ?
So in my situation, what can I do to make a range search from:100, to:200 work for 100.123 other than a full dump and re-import? Are there any conversion options available?
Update with detailed specs
{
"state": "open",
"settings": {
"index": {
"creation_date": "1447858537098",
"number_of_shards": "5",
"uuid": "iiPzQXasQadvnDF1da8oMw",
"version": {
"created": "1070299"
},
"number_of_replicas": "1"
}
},
"mappings": {
"mongo_doc": {
"properties": {
"parent": {
"type": "string"
},
"data.current.specs.nr._nrm_val": {
"type": "double"
},
"data.current.specs.nr_b._nrm_val": {
"type": "double"
},
"data": {
"properties": {
"current": {
"properties": {
"specs": {
"properties": {
"nr": {
"properties": {
"_nrm_val": {
"type": "double"
}
}
},
"nr_b": {
"properties": {
"_nrm_val": {
"type": "long"
}
}
}
}
}
}
}
}
}
}
}
},
"aliases": []
}
Seems that the mapping is not quite right... switched to ['data']['properties']['current']['properties'](...) notation.
In your case that field should have been double, not long. And the indexed value for 100.123 is 100 and you loose the decimals.
At this point, other than re-indexing which is ideal, probably just scripted filtering will do it:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source['nr'].value >= param1 && _source['nr'].value <= param2",
"params": {
"param1": 100,
"param2": 200
}
}
}
}
}
}
but it will be expensive because of the _source loading.

Resources