How to search in filtered list of nested objects only - elasticsearch

this seems to be a complex one - not sure if it's possible to manage without scripting + I would like to be able to boost name or value fields.
Let's imagine the following documents:
{
"name":"Red Big Blue document",
"nested_key_value_properties":[
{
"key":"description 1",
"value":"red"
},
{
"key":"description 2",
"value":"big"
},
{
"key":"description 3",
"value":"blue"
}
]
}
{
"name":"Black Small Red document",
"nested_key_value_properties":[
{
"key":"description 1",
"value":"red"
},
{
"key":"description 2",
"value":"small"
},
{
"key":"description 3",
"value":"black"
}
]
}
{
"name":"Yellow Big Red document",
"nested_key_value_properties":[
{
"key":"description 1",
"value":"yellow"
},
{
"key":"description 2",
"value":"big"
},
{
"key":"description 3",
"value":"red"
}
]
}
I wish to get the documents that have the key description 1 of the value of red only (first and second document) - the last document should not be in results.

TLDR;
Elastic flatten objects. Such that
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
Turn into:
{
"group" : "fans",
"user.first" : [ "alice", "john" ],
"user.last" : [ "smith", "white" ]
}
To avoid that you need to set the mapping manually by telling that nested_key_value_properties is going to be a nested field.
And then perform a nested query.
See below for how to do so.
To reproduce
PUT /71217348/
{
"settings": {},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"nested_key_value_properties": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71217348"}}
{"name":"Red Big Blue document","nested_key_value_properties":[{"key":"description 1","value":"red"},{"key":"description 2","value":"big"},{"key":"description 3","value":"blue"}]}
{"index":{"_index":"71217348"}}
{"name":"Black Small Red document","nested_key_value_properties":[{"key":"description 1","value":"red"},{"key":"description 2","value":"small"},{"key":"description 3","value":"black"}]}
{"index":{"_index":"71217348"}}
{"name":"Yellow Big Red document","nested_key_value_properties":[{"key":"description 1","value":"yellow"},{"key":"description 2","value":"big"},{"key":"description 3","value":"red"}]}
GET /71217348/_search
{
"query": {
"nested": {
"path": "nested_key_value_properties",
"query": {
"bool": {
"must": [
{
"match": {
"nested_key_value_properties.key": "description 1"
}
},
{
"match": {
"nested_key_value_properties.value": "red"
}
}
]
}
}
}
}
}

Related

Aggregate by property on parent document with Elasticsearch join field

I have an Elasticsearch index that uses a join type field to relate two types of indexed documents to each other via a parent-child relation: posts which are parents of comments.
posts have a category keyword field, and comments belong to posts. I would like to find the number of comments in each post category, like so:
// what query do I need to get this result?
{
"aggregations" : {
"comment-counts-by-post-category" : {
"buckets" : [
{
"key" : "Dogs",
"doc_count" : 2,
},
{
"key" : "Cats",
"doc_count" : 1,
}
]
}
}
}
Here is a complete example:
I have an index with the following mapping:
PUT posts-index/
{
"mappings": {
"properties": {
"post": {
"type": "object",
"properties": {
"category": {
"type": "keyword"
}
}
},
"text": {
"type": "keyword"
},
"post_comment_join": {
"type": "join",
"relations": {
"post": "comment"
}
}
}
}
}
I create two posts, one in the Dogs category, and one in the Cats category:
PUT posts-index/_doc/post-1
{
"text": "this is a dog post",
"post": {
"category": "Dogs"
},
"post_comment_join": {
"name": "post"
}
}
PUT posts-index/_doc/post-2
{
"text": "this is a cat post",
"post": {
"category": "Cats"
},
"post_comment_join": {
"name": "post"
}
}
Then, I create a few comments (in this case, 2 on the dog post and 1 on the cat post)
PUT posts-index/_doc/comment-1&routing=1&refresh
{
"text": "this is comment 1 for post 1",
"post_comment_join": {
"name": "comment",
"parent": "post-1"
}
}
PUT posts-index/_doc/comment-2&routing=1&refresh
{
"text": "this is comment 2 for post 1",
"post_comment_join": {
"name": "comment",
"parent": "post-1"
}
}
PUT posts-index/_doc/comment-3&routing=1&refresh
{
"text": "this is a comment 1 for post 2",
"post_comment_join": {
"name": "comment",
"parent": "post-2"
}
}
I can search for all comment documents using a has_parent query:
POST post-index/_search
{
"query": {
"has_parent": {
"parent_type": "post",
"query": {
"match_all": {}
}
}
}
}
{
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [ /* returns the 3 comments */ ]
}
}
What I can't figure out how to do is find the number of comments in each category
I've looked into Parent Aggregations, but they seem to only allow you aggregate based on the type of the parent. In this case, all parents are of type post, so that doesn't help.
I've also tried using a basic terms aggregation using the join_field#parent_field syntax:
POST post-index/_search
{
"query": {
"has_parent": {
"parent_type": "post",
"query": {
"match_all": {}
}
}
},
"aggs": {
"agg-by-post-category": {
"terms": {
"field": "post_comment_join#post.category"
}
}
}
}
// returns { "buckets": [] } in the aggs
Unfortunately, this returns no results. It seems as though the post_comment_join#post syntax can be used to aggregate by parent doc, but not by an attribute on the parent doc. (i.e., by the _id field of a post, but not by post.category)
Can anyone help me figure out the right aggs syntax to return all comments grouped by their parent post's category?
Again, here is the result I'm looking for:
{
"aggregations" : {
"comment-counts-by-post-category" : {
"buckets" : [
{
"key" : "Dogs",
"doc_count" : 2,
},
{
"key" : "Cats",
"doc_count" : 1,
}
]
}
}
}
Platform details
Amazon Opensearch service version 7.9
You can use any of below two to find count of comments by category.
GET posts-index/_search
{
"query": {
"has_child": {
"type": "comment",
"inner_hits": {
"_source": false,
"size": 0
},
"query": {
"match_all": {}
}
}
}
}
GET posts-index/_search
{
"aggs": {
"top-tags": {
"terms": {
"field": "post.category",
"size": 10
},
"aggs": {
"to-answers": {
"children": {
"type": "comment"
},
"aggs": {
"comments-count": {
"value_count": {
"field": "text"
}
}
}
}
}
}
}
}

within Array search in ElasticSearch

I need to search in array of ElasticSearch. I've documents like
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "128GB"
},
{
"color": "white",
"memory": "64GB"
}
],
},
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "64GB"
},
{
"color": "white",
"memory": "64GB"
}
],
}
I want to search iphone 9 with color = black and memory = 64GB. I'm using following query
_search?q=product_name:"iPhone 9"+AND+features.color:"black"+AND+features.memory:"64GB"
Only the second record from the document should get listed, but this query is displaying both the records as it matches color with first array and memory with second array. How can I achieve the correct result?
Elasticsearch has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values.
Your document will be transformed internally and stored as
{
"product_name" : "iPhone 9",
"features.color" : [ "black", "white" ],
"features.memory" : [ "128GB", "64GB" ]
}
The associate between color and memory is lost.
If you need to maintain independence of each inner object of array , you need to use nested type
Nested type can be only queried using nested query.
PUT index-name
{
"mappings": {
"properties": {
"features": {
"type": "nested"
}
}
}
}
PUT index-name/_doc/1
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "128GB"
},
{
"color": "white",
"memory": "64GB"
}
],
}
GET index-name/_search
{
"query": {
"nested": {
"path": "features",
"query": {
"bool": {
"must": [
{ "match": { "features.color": "black" }},
{ "match": { "features.memory": "64GB" }}
]
}
}
}
}
}

ElasticSearch query nested path filter OR

I have following index:
PUT /ab11
{
"mappings": {
"properties": {
"product_id": {
"type": "keyword"
},
"data": {
"type": "nested",
"properties": {
"p_id": {
"type": "keyword"
}
}
}
}
}
}
PUT /ab11/_doc/1
{
"product_id": "123",
"data": [
{
"p_id": "a"
},
{
"p_id": "b"
},
{
"p_id": "c"
}
]
}
I want to do query like following sql does(NOTE: I want to do filter not query, because I don't care about score) :
select * from abc11 where data.pid = "a" or data.pid = "b"
You can do it like this because the terms query has OR semantics by default:
{
"query": {
"nested": {
"path": "data",
"query": {
"terms": {
"data.p_id": [
"a",
"b"
]
}
}
}
}
}
Basically, select all documents which have either "a" or "b" in their data.p_id nested docs.

Term aggregation on ElasticSearch join

I would like to perform an aggregation on a join relation using ElasticSearch 7.7.
I need to know how many children I have for each parent.
The only way that I found to solve my issue is to use script inside term aggregation, but my concern is about performance.
/my_index/_search
{
"size": 0,
"aggs": {
"total": {
"terms": {
"script": {
"lang": "painless",
"source": "params['_source']['my_join']['parent']"
}
}
},
"max_total": {
"max_bucket": {
"buckets_path": "total>_count"
}
}
}
}
Someone knows a more fast way to execute this aggregation avoiding the script?
If the join field wasn't a parent/child I could replace the term aggregation with:
"terms": { "field": "my_field" }
To give more context I add some information about mapping:
I'm using Elastic 7.7.
I also attach a mapping with some sample documents:
{
"mappings": {
"properties": {
"my_join": {
"relations": {
"other": "doc"
},
"type": "join"
},
"reader": {
"type": "keyword"
},
"name": {
"type": "text"
},
"content": {
"type": "text"
}
}
}
}
PUT example/_doc/1
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/2
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/3
{
"content": "abc",
"my_join": {
"name": "doc",
"parent": 1
}
}
PUT example/_doc/4
{
"content": "def",
"my_join": {
"name": "doc"
"parent": 2
}
}
PUT example/_doc/5
{
"content": "def",
"acl_join": {
"name": "doc"
"parent": 1
}
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources