within Array search in ElasticSearch - elasticsearch

I need to search in array of ElasticSearch. I've documents like
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "128GB"
},
{
"color": "white",
"memory": "64GB"
}
],
},
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "64GB"
},
{
"color": "white",
"memory": "64GB"
}
],
}
I want to search iphone 9 with color = black and memory = 64GB. I'm using following query
_search?q=product_name:"iPhone 9"+AND+features.color:"black"+AND+features.memory:"64GB"
Only the second record from the document should get listed, but this query is displaying both the records as it matches color with first array and memory with second array. How can I achieve the correct result?

Elasticsearch has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values.
Your document will be transformed internally and stored as
{
"product_name" : "iPhone 9",
"features.color" : [ "black", "white" ],
"features.memory" : [ "128GB", "64GB" ]
}
The associate between color and memory is lost.
If you need to maintain independence of each inner object of array , you need to use nested type
Nested type can be only queried using nested query.
PUT index-name
{
"mappings": {
"properties": {
"features": {
"type": "nested"
}
}
}
}
PUT index-name/_doc/1
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "128GB"
},
{
"color": "white",
"memory": "64GB"
}
],
}
GET index-name/_search
{
"query": {
"nested": {
"path": "features",
"query": {
"bool": {
"must": [
{ "match": { "features.color": "black" }},
{ "match": { "features.memory": "64GB" }}
]
}
}
}
}
}

Related

How to search in filtered list of nested objects only

this seems to be a complex one - not sure if it's possible to manage without scripting + I would like to be able to boost name or value fields.
Let's imagine the following documents:
{
"name":"Red Big Blue document",
"nested_key_value_properties":[
{
"key":"description 1",
"value":"red"
},
{
"key":"description 2",
"value":"big"
},
{
"key":"description 3",
"value":"blue"
}
]
}
{
"name":"Black Small Red document",
"nested_key_value_properties":[
{
"key":"description 1",
"value":"red"
},
{
"key":"description 2",
"value":"small"
},
{
"key":"description 3",
"value":"black"
}
]
}
{
"name":"Yellow Big Red document",
"nested_key_value_properties":[
{
"key":"description 1",
"value":"yellow"
},
{
"key":"description 2",
"value":"big"
},
{
"key":"description 3",
"value":"red"
}
]
}
I wish to get the documents that have the key description 1 of the value of red only (first and second document) - the last document should not be in results.
TLDR;
Elastic flatten objects. Such that
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
Turn into:
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
To avoid that you need to set the mapping manually by telling that nested_key_value_properties is going to be a nested field.
And then perform a nested query.
See below for how to do so.
To reproduce
PUT /71217348/
{
"settings": {},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"nested_key_value_properties": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71217348"}}
{"name":"Red Big Blue document","nested_key_value_properties":[{"key":"description 1","value":"red"},{"key":"description 2","value":"big"},{"key":"description 3","value":"blue"}]}
{"index":{"_index":"71217348"}}
{"name":"Black Small Red document","nested_key_value_properties":[{"key":"description 1","value":"red"},{"key":"description 2","value":"small"},{"key":"description 3","value":"black"}]}
{"index":{"_index":"71217348"}}
{"name":"Yellow Big Red document","nested_key_value_properties":[{"key":"description 1","value":"yellow"},{"key":"description 2","value":"big"},{"key":"description 3","value":"red"}]}
GET /71217348/_search
{
"query": {
"nested": {
"path": "nested_key_value_properties",
"query": {
"bool": {
"must": [
{
"match": {
"nested_key_value_properties.key": "description 1"
}
},
{
"match": {
"nested_key_value_properties.value": "red"
}
}
]
}
}
}
}
}

Sorting result based on dynamic terms

Imagine that I have an index with the following three documents representing images and their colors.
[
{
"id": 1,
"intensity": {
"red": 0.6,
"green": 0.1,
"blue": 0.3
}
},
{
"id": 2,
"intensity": {
"red": 0.5,
"green": 0.6,
"blue": 0.0
}
},
{
"id": 3,
"intensity": {
"red": 0.98,
"green": 0.0,
"blue": 0.0
}
}
]
It the user wants a "red image" (selected in a dropdown or in a “tag cloud”), it is very convenient to do a range query over the floats (maybe intensity.red > 0.5). I can also use the score of that query to get the "red-est" image ranked highest.
However, if I would like to offer free-text search, it gets harder. My solution to that would to index the documents as the following (eg use the if color > 0.5 then append(colors, color_name) at index time):
[
{
"id": 1,
"colors": ["red"]
},
{
"id": 2,
"colors": ["green", "red"]
}
{
"id": 3,
"colors": ["red"]
}
]
I could now use a query_string or a match on the colors field and then search for "red", but all of a sudden I lost my ranking possibilities. ID 3 is far more red than ID 1 (0.98 vs 0.6) but the score will be similar?
My question is: Can I have the cake and eat it too?
One solution I see is to have one index that turns free text into "keywords" which I later use in the actual search.
POST image_tag_index/_search {query: "redish"} -> [ "red" ]
POST images/_search {query: {"red" > 0.5}} -> [ {id: 1}, {id: 3}]
But then I need to fire two searches for every search, but maybe that is the only option?
You can make use of nested data type along with function_score query to get the desired result.
You need to change the way you are storing image data. The mapping will be as below:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
},
"image": {
"type": "nested",
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"intensity": {
"type": "float"
}
}
}
}
}
}
}
Index the image data as below:
PUT test/_doc/1
{
"id": 1,
"image": [
{
"color": "red",
"intensity": 0.6
},
{
"color": "green",
"intensity": 0.1
},
{
"color": "blue",
"intensity": 0.3
}
]
}
The above corresponds to the first image data the you posted in the question. Similarly you can index other images data.
Now when a user search for red the query should be build as below:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "image",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"image.color": "red"
}
},
{
"range": {
"image.intensity": {
"gt": 0.5
}
}
}
]
}
},
"field_value_factor": {
"field": "image.intensity",
"modifier": "none",
"missing": 0
}
}
}
}
}
]
}
}
}
You can see in the above query that I have used the field value of image.intensity to calculate the score.

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Item variants in ElasticSearch

What is the best way to use item variants in elasticsearch and retrieving only 1 item of the variant group?
For example, let's say I have the following items:
[{
"sku": "abc-123",
"group": "abc",
"color": "red",
"price": 10
},
{
"sku": "def-123",
"group": "def",
"color": "red",
"price": 10
},
{
"sku": "abc-456",
"group": "abc",
"color": "black",
"price": 20
}
]
The first item and the last one are in the same group, so I want only to return one of them if I query for items below the price of 20 (for example), but with the best hit score.
Feel free to suggest documents design and queries accordingly.
If your mapping is of Nested datatype, then you can use this to retrieve them.
GET index/type/_search
{
"size": 2000,
"_source": false,
"query": {
"bool": {
"filter": {
"nested": {
"path": "childs",
"query": {
"bool": {
"filter": {
"term": {
"childs.group.keyword": "abc"
}
}
}
},
"inner_hits": {}
}
}
}
}
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources