Elastic search array of objects nested range aggregation - elasticsearch

I'm trying to make range aggregation on the following data set:
{
"ProductType": 1,
"ProductDefinition": "fc588f8e-14f2-4871-891f-c73a4e3d17ca",
"ParentProduct": null,
"Sku": "074617",
"VariantSku": null,
"Name": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"AllowOrdering": true,
"Rating": null,
"ThumbnailImageUrl": "/media/1106/074617.jpg",
"PrimaryImageUrl": "/media/1106/074617.jpg",
"Categories": [
"399d7b20-18cc-46c0-b63e-79eadb9390c7"
],
"RelatedProducts": [],
"Variants": [
"84a7ff9f-edf0-4aab-87f9-ba4efd44db74",
"e2eb2c50-6abc-4fbe-8fc8-89e6644b23ef",
"a7e16ccc-c14f-42f5-afb2-9b7d9aefbc5c"
],
"PriceGroups": [
"86182755-519f-4e05-96ef-5f93a59bbaec"
],
"DisplayName": "Paraboot Avoriaz/Jannu Marron Brut Marron Brown Hiking Boot Shoes",
"ShortDescription": "",
"LongDescription": "<ul><li>Paraboot Avoriaz Mountaineering Boots</li><li>Marron Brut Marron (Brown)</li><li>Full leather inners and uppers</li><li>Norwegien Welted Commando Sole</li><li>Hand made in France</li><li>Style number : 074617</li></ul><p>As featured on Pritchards.co.uk</p>",
"UnitPrices": {
"EUR 15 pct": 343.85
},
"Taxes": {
"EUR 15 pct": 51.5775
},
"PricesInclTax": {
"EUR 15 pct": 395.4275
},
"Slug": "paraboot-avoriazjannu-marron-brut-marron-brown-hiking-boot-shoes",
"VariantsProperties": [
{
"Key": "ShoeSize",
"Value": "8"
},
{
"Key": "ShoeSize",
"Value": "10"
},
{
"Key": "ShoeSize",
"Value": "6"
}
],
"Guid": "0d4f6899-c66a-4416-8f5d-26822c3b57ae",
"Id": 178,
"ShowOnHomepage": true
}
I'm aggregating on VariantsProperties which have the following mapping
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword"
}
}
}
Terms aggregations are working fine with following code:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"terms": {
"field": "VariantsProperties.Key"
},
"aggs": {
"values": {
"terms": {
"field": "VariantsProperties.Value"
}
}
}
}
}
}
}
}
However when I try to do a range aggregation to get shoes in size between 8 - 12 such as:
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"field": "VariantsProperties.Value",
"ranges": [ { "from": 8, "to": 12 }]
}
}
}
}
}
}
I get the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "product-avenueproductindexdefinition-24476f82-en-us",
"node": "ejgN4XecT1SUfgrhzP8uZg",
"reason": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
],
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Field [VariantsProperties.Value] of type [keyword] is not supported for aggregation [range]"
}
}
},
"status": 400
}
Is there a way to "transform" the terms aggregation into a range aggregation, without the need of changing the schema? I know I could build the ranges myself by extracting the data from the terms aggregation and building the ranges out of it, however, I would prefer a solution within the elastic itself.

There are two ways to solve this:
Option A: Use a script instead of a field. This option will work without having to reindex your data, but depending on your volume of data, the performance might suffer.
POST test/_search
{
"aggs": {
"Nest": {
"nested": {
"path": "VariantsProperties"
},
"aggs": {
"fieldIds": {
"range": {
"script": "Integer.parseInt(doc['VariantsProperties.Value'].value)",
"ranges": [
{
"from": 8,
"to": 12
}
]
}
}
}
}
}
}
Option B: Add an integer sub-field in your mapping.
PUT my-index/_mapping
{
"properties": {
"VariantsProperties": {
"type": "nested",
"properties": {
"Key": {
"type": "keyword"
},
"Value": {
"type": "keyword",
"fields": {
"numeric": {
"type": "integer",
"ignore_malformed": true
}
}
}
}
}
}
}
Once your mapping is modified, you can run _update_by_query on your index in order to reindex the VariantsProperties.Value data
PUT my-index/_update_by_query
Finally, when this last command is done, you can run the range aggregation on the VariantsProperties.Value.numeric field.
Also note that this second but will be more performant on the long term.

Related

How to query documents where a rank_features field is missing?

I have an index with a few hundred thousand documents. Some of them have a rank_features field called my_field. I want to retrieve documents without that field.
I tried:
"query": {
"bool": {
"must_not": [
{"exists": {"field":"my_field"}}]
...
But I get the following error:
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: [rank_features] fields do not support [exists] queries",
...
The index mapping is defined as follows:
"mappings": {
"dynamic": "strict",
"_routing": {
"required": true
},
"properties": {
"my_field": {
"properties": {
"my_subfield": {
"type": "rank_features"
}
}
...
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"mapping": {
"total_fields": {
"limit": "2000"
}
},
"refresh_interval": "1s",
"number_of_shards": "10",
"blocks": {
"write": "false"
},
Note that despite the mapping being strict, this field was added recently and older documents don't have it.
Tldr;
You are doing a exist query against a field that only support rank_feature queries
As per the documentation of the rank_features field.
rank_features fields do not support sorting or aggregating and may only be queried using rank_feature queries.

Elastic Search Wildcard query with space failing 7.11

I am having my data indexed in elastic search in version 7.11. This is my mapping i got when i directly added documents to my index.
{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}
I havent added the keyword part but no idea where it came from.
I am running a wild card query on the same. But unable to get data for keywords with spaces.
{
"query": {
"bool":{
"should":[
{"wildcard": {"name":"*hello world*"}}
]
}
}
}
Have seen many answers related to not_analyzed . And i have tried updating {"index":"true"} in mapping but with no help. How to make the wild card search work in this version of elastic search
Tried adding the wildcard field
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type" :"wildcard"
}
}
}
And got following response
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [name] cannot be changed from type [text] to [wildcard]"
},
"status": 400
}
Adding a sample document to match
{
"_index": "accelerators",
"_type": "_doc",
"_id": "602ec047a70f7f30bcf75dec",
"_score": 1.0,
"_source": {
"acc_id": "602ec047a70f7f30bcf75dec",
"name": "hello world example",
"type": "Accelerator",
"description": "khdkhfk ldsjl klsdkl",
"teamMembers": [
{
"userId": "karthik.r#gmail.com",
"name": "Karthik Ganesh R",
"shortName": "KR",
"isOwner": true
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS",
"isOwner": false
}
],
"sectorObj": [
{
"item_id": 14,
"item_text": "Cross-sector"
}
],
"geographyObj": [
{
"item_id": 4,
"item_text": "Global"
}
],
"technologyObj": [
{
"item_id": 1,
"item_text": "Artificial Intelligence"
}
],
"themeColor": 1,
"mainImage": "assets/images/Graphics/Asset 35.svg",
"features": [
{
"name": "Ideation",
"icon": "Asset 1007.svg"
},
{
"name": "Innovation",
"icon": "Asset 1044.svg"
},
{
"name": "Strategy",
"icon": "Asset 1129.svg"
},
{
"name": "Intuitive",
"icon": "Asset 964.svg"
},
],
"logo": {
"actualFileName": "",
"fileExtension": "",
"fileName": "",
"fileSize": 0,
"fileUrl": ""
},
"customLogo": {
"logoColor": "#B9241C",
"logoText": "EC",
"logoTextColor": "#F6F6FA"
},
"collaborators": [
{
"userId": "muhammed.arif#gmail.com",
"name": "muhammed Arif P T",
"shortName": "MA"
},
{
"userId": "anand.sajan#gmail.com",
"name": "Anand Sajan",
"shortName": "AS"
}
],
"created_date": "2021-02-18T19:30:15.238000Z",
"modified_date": "2021-03-11T11:45:49.583000Z"
}
}
You cannot modify a field mapping once created. However, you can create another sub-field of type wildcard, like this:
PUT http://localhost:9001/indexname/_mapping
{
"properties": {
"name": {
"type": "text",
"fields": {
"wildcard": {
"type" :"wildcard"
},
"keyword": {
"type" :"keyword",
"ignore_above":256
}
}
}
}
}
When the mapping is updated, you need to reindex your data so that the new field gets indexed, like this:
POST http://localhost:9001/indexname/_update_by_query
And then when this finishes, you'll be able to query on this new field like this:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"name.wildcard": "*hello world*"
}
}
]
}
}
}

Elasticsearch: Why can't I use "5m" for precision in context queries?

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

Elasticsearch unexpected results when sorting against deeply nested attributes

I'm trying to perform some sorting based on the attributes of a document's deeply nested children.
Let's say we have an index filled with publisher documents. A publisher has a collection of books, and
each book has a title, a published flag, and a collection of genre scores. A genre_score represents how well
a particular book matches a particular genre, or in this case a genre_id.
First, let's define some mappings (for simplicity, we will only be explicit about the nested types):
curl -XPUT 'localhost:9200/book_index' -d '
{
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested"
}
}
}
}
}
}
}'
Here are our two publishers:
curl -XPUT 'localhost:9200/book_index/publisher/1' -d '
{
"name": "Best Books Publishing",
"books": [
{
"name": "Published with medium genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 50 },
{ "genre_id": 2, "score": 15 }
]
}
]
}'
curl -XPUT 'localhost:9200/book_index/publisher/2' -d '
{
"name": "Puffin Publishers",
"books": [
{
"name": "Published book with low genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 10 },
{ "genre_id": 4, "score": 10 }
]
},
{
"name": "Unpublished book with high genre_id of 1",
"published": false,
"genre_scores": [
{ "genre_id": 1, "score": 100 },
{ "genre_id": 2, "score": 35 }
]
}
]
}'
And here is the final definition of our index & mappings...
curl -XGET 'localhost:9200/book_index/_mappings?pretty=true'
...
{
"book_index": {
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested",
"properties": {
"genre_id": {
"type": "long"
},
"score": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"published": {
"type": "boolean"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now suppose we want to query for a list of publishers, and have them sorted by those who books performing
well in a particular genre. In other words, sort the publishers by the genre_score.score of one of their books
for the target genre_id.
We might write a search query like this...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"term": {
"books.genre_scores.genre_id": 1
}
}
}
}
],
"_source":false,
"query": {
"nested": {
"path": "books",
"query": {
"bool": {
"must": []
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Which correctly returns the Puffin (with a sort value of [100]) first and Best Books second (with a sort value of [50]).
But suppose we only want to consider books for which published is true. This would change our expectation to have Best Books first (with a sort of [50]) and Puffin second (with a sort of [10]).
Let's update our nested_filter and query to the following...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"bool": {
"must": [
{
"term": {
"books.genre_scores.genre_id": 1
}
}, {
"term": {
"books.published": true
}
}
]
}
}
}
}
],
"_source": false,
"query": {
"nested": {
"path": "books",
"query": {
"term": {
"books.published": true
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Suddenly, our sort values for both publishers has become [-9223372036854775808].
Why does adding an additional term to our nested_filter in the top-level sort have this impact?
Can anyone provide some insight as to why this behavior is happening? And additionally, if there are any viable solutions to the proposed query/sort?
This occurs in both ES1.x and ES5
Thanks!

Elasticsearch: Query nested object contained within an object

I'm struggling to build a query where I can do a nested search across a sub-object of a document.
Say I have the following index/mapping:
curl -XPOST "http://localhost:9200/author/" -d '
{
"mappings": {
"item": {
"properties": {
"books": {
"type": "object",
"properties": {
"data": {
"type": "nested"
}
}
}
}
}
}
}
'
And the following 2 documents in the index:
{
"id": 1,
"name": "Robert Louis Stevenson",
"books": {
"count": 2,
"data": [
{
"id": 1,
"label": "Treasure Island"
},
{
"id": 3,
"label": "Dr Jekyll and Mr Hyde"
}
]
}
}
and
{
"id": 2,
"name": "Philip K. Dick",
"books": {
"count": 1,
"data": [
{
"id": 4,
"label": "Do Android Dream of Electric Sheep"
}
]
}
}
I have an array of Book ID's, say [1,4]; how would I write a query which does a keyword search of the author name AND only returns them if they wrote one of the books in the array?
I haven't managed to get a query which doesn't cause some sort of query parse_exception, but as a starting block, here's the current iteration of my query - maybe it's obvious where I'm going wrong?
{
"query": {
"bool": {
"must": {
"match": {
"label": "Louis"
}
}
},
"nested": {
"path": "books.data",
"query": {
"bool": {
"must": {
"terms": {
"books.data.id": [
1,
4
]
}
}
}
}
}
},
"from": 0,
"size": 8
}
In the above scenario I'd like the document for Mr Robert Louis Stevenson to be returned, as his name contains Louis and he wrote book ID 1.
For what it's worth, the current error I get looks like this:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "failed to parse search source. expected field name but got [START_OBJECT]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "author",
"node": "sCk3su4YSnqhvdTGjOztlw",
"reason": {
"type": "parse_exception",
"reason": "failed to parse search source. expected field name but got [START_OBJECT]"
}
}
]
},
"status": 400
}
This makes me feel like I've got my "nested" object all wrong, but the docs suggest that I'm right!
You have it almost right, the nested query must simply be located inside the bool one like in the query below. Also the match query needs to be made on the name field since this is where the author name is stored:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Louis"
}
},
{
"nested": {
"path": "books.data",
"query": {
"bool": {
"must": {
"terms": {
"books.data.id": [
1,
4
]
}
}
}
}
}
}
]
}
},
"from": 0,
"size": 8
}

Resources