Parent Child Relation In Elastic Search 7.5 - elasticsearch

I am new to "Elastic Search" and currently trying to understand how does ES maintain "Parent-Child" relationship. I started with the following article:
https://www.elastic.co/blog/managing-relations-inside-elasticsearch
But the article is based on old version of ES and I am currently using ES 7.5 which states that:
The _parent field has been removed in favour of the join field.
Now I am currently following this article:
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/parent-join.html
However, I am not able to get the desired result.
I have a scenario in which i have two indices "Person" and "Home". Each "Person" can have multiple "Home" which is basically a one-to-many relation. Problem is when I query to fetch all homes whose parent is "XYZ" person the answer is null.
Below are my indexes structure and search query:
Person Index:
Request URL: http://hostname/person
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Home Index:
Request URL: http://hostname/home
{
"mappings": {
"properties": {
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Adding data in person Index
Request URL: http://hostname/person/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
Adding data in home index
Request URL: http://hostname/home/_doc/2?routing=1&refresh
{
"state": "ontario",
"person_home": {
"name": "home",
"parent": "1"
}
}
Query to fetch data: (To fetch all the records who parent is person id "1")
Request URL: http://hostname/person/_search
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"name": "shujaat"
}
}
}
}
}
OR
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"_id": "1"
}
}
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
I am unable to understand what I am missing here or what is wrong with the above mentioned query as it not returning any data.

You should put the parent and child documents in the same index:
The join datatype is a special field that creates parent/child
relation within documents of the same index.
So the mapping would look like the following:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Notice that it has both fields from your original person and home indexes.
The rest of your code should work just fine. Try inserting the person and home documents into the same index person_home and use the queries as you posted in the question.
What if person and home objects have overlapping field names?
Let's say, both object types have got field name but we want to index and query them separately. In this case we can come up with a mapping like this:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"person": {
"properties": {
"name": {
"type": "text"
}
}
},
"home": {
"properties": {
"name": {
"type": "keyword"
},
"state": {
"type": "text"
}
}
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Now, we should change the structure of the objects themselves:
PUT http://hostname/person_home/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
PUT http://hostname/person_home/_doc/2?routing=1&refresh
{
"home": {
"name": "primary",
"state": "ontario"
},
"person_home": {
"name": "home",
"parent": "1"
}
}
If you have to migrate old data from the two old indexes into a new merged one, reindex API may be of use.

Related

Aggregate by property on parent document with Elasticsearch join field

I have an Elasticsearch index that uses a join type field to relate two types of indexed documents to each other via a parent-child relation: posts which are parents of comments.
posts have a category keyword field, and comments belong to posts. I would like to find the number of comments in each post category, like so:
// what query do I need to get this result?
{
"aggregations" : {
"comment-counts-by-post-category" : {
"buckets" : [
{
"key" : "Dogs",
"doc_count" : 2,
},
{
"key" : "Cats",
"doc_count" : 1,
}
]
}
}
}
Here is a complete example:
I have an index with the following mapping:
PUT posts-index/
{
"mappings": {
"properties": {
"post": {
"type": "object",
"properties": {
"category": {
"type": "keyword"
}
}
},
"text": {
"type": "keyword"
},
"post_comment_join": {
"type": "join",
"relations": {
"post": "comment"
}
}
}
}
}
I create two posts, one in the Dogs category, and one in the Cats category:
PUT posts-index/_doc/post-1
{
"text": "this is a dog post",
"post": {
"category": "Dogs"
},
"post_comment_join": {
"name": "post"
}
}
PUT posts-index/_doc/post-2
{
"text": "this is a cat post",
"post": {
"category": "Cats"
},
"post_comment_join": {
"name": "post"
}
}
Then, I create a few comments (in this case, 2 on the dog post and 1 on the cat post)
PUT posts-index/_doc/comment-1&routing=1&refresh
{
"text": "this is comment 1 for post 1",
"post_comment_join": {
"name": "comment",
"parent": "post-1"
}
}
PUT posts-index/_doc/comment-2&routing=1&refresh
{
"text": "this is comment 2 for post 1",
"post_comment_join": {
"name": "comment",
"parent": "post-1"
}
}
PUT posts-index/_doc/comment-3&routing=1&refresh
{
"text": "this is a comment 1 for post 2",
"post_comment_join": {
"name": "comment",
"parent": "post-2"
}
}
I can search for all comment documents using a has_parent query:
POST post-index/_search
{
"query": {
"has_parent": {
"parent_type": "post",
"query": {
"match_all": {}
}
}
}
}
{
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [ /* returns the 3 comments */ ]
}
}
What I can't figure out how to do is find the number of comments in each category
I've looked into Parent Aggregations, but they seem to only allow you aggregate based on the type of the parent. In this case, all parents are of type post, so that doesn't help.
I've also tried using a basic terms aggregation using the join_field#parent_field syntax:
POST post-index/_search
{
"query": {
"has_parent": {
"parent_type": "post",
"query": {
"match_all": {}
}
}
},
"aggs": {
"agg-by-post-category": {
"terms": {
"field": "post_comment_join#post.category"
}
}
}
}
// returns { "buckets": [] } in the aggs
Unfortunately, this returns no results. It seems as though the post_comment_join#post syntax can be used to aggregate by parent doc, but not by an attribute on the parent doc. (i.e., by the _id field of a post, but not by post.category)
Can anyone help me figure out the right aggs syntax to return all comments grouped by their parent post's category?
Again, here is the result I'm looking for:
{
"aggregations" : {
"comment-counts-by-post-category" : {
"buckets" : [
{
"key" : "Dogs",
"doc_count" : 2,
},
{
"key" : "Cats",
"doc_count" : 1,
}
]
}
}
}
Platform details
Amazon Opensearch service version 7.9
You can use any of below two to find count of comments by category.
GET posts-index/_search
{
"query": {
"has_child": {
"type": "comment",
"inner_hits": {
"_source": false,
"size": 0
},
"query": {
"match_all": {}
}
}
}
}
GET posts-index/_search
{
"aggs": {
"top-tags": {
"terms": {
"field": "post.category",
"size": 10
},
"aggs": {
"to-answers": {
"children": {
"type": "comment"
},
"aggs": {
"comments-count": {
"value_count": {
"field": "text"
}
}
}
}
}
}
}
}

Include joined children with Elasticsearch GET request

I have an Elasticsearch index events that has a join field so that an event can have multiple instances (i.e. the same event can occur on different dates). In this simplified mapping, an event doc has fields for title and url while an instance doc has start/end date fields:
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"url": {
"type": "keyword"
},
"dt": {
"type": "date"
},
"end_dt": {
"type": "date"
},
"event_or_instance": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"event": "instance"
}
}
}
}
}
I know how to get an event and includes all of its instances using has_child:
GET /events/_search
{
"query" : {
"bool": {
"filter": [
{
"term": {
"_id": {
"value": "c8871a79-1907-46c0-958c-9731c529b93e"
}
}
},
{
"has_child" : {
"type" : "instance",
"query" : { "match_all": {} },
"inner_hits" : {
"_source": true,
"sort": [{"dt": "asc"}]
}
}
}
]
}
},
"_source": true
}
This works fine, but is there a way to do this using the Get/Multi-get API instead of the Search API?

Elasticsearch - Mapping fields from other indices

How can I define mapping in Elasticsearch 7 to index a document with a field value from another index? For example, if I have a users index which has a mapping for name, email and account_number but the account_number value is actually in another index called accounts in field number.
I've tried something like this without much success (I only see "name", "email" and "account_id" in the results):
PUT users/_mapping
{
"properties": {
"name": {
"type": "text"
},
"email": {
"type": "text"
},
"account_id": {
"type": "integer"
},
"accounts": {
"properties": {
"number": {
"type": "text"
}
}
}
}
}
The accounts index has the following mapping:
{
"properties": {
"name": {
"type": "text"
},
"number": {
"type": "text"
}
}
}
As I understand it, you want to implement field joining as is usually done in relational databases. In elasticsearch, this is possible only if the documents are in the same index. (Link to doc). But it seems to me that in your case you need to work differently, I think your Account object needs to be nested for User.
PUT /users/_mapping
{
"mappings": {
"properties": {
"account": {
"type": "nested"
}
}
}
}
You can further search as if it were a separate document.
GET /users/_search
{
"query": {
"nested": {
"path": "account",
"query": {
"bool": {
"must": [
{ "match": { "account.number": 1 } }
]
}
}
}
}
}

Term aggregation on ElasticSearch join

I would like to perform an aggregation on a join relation using ElasticSearch 7.7.
I need to know how many children I have for each parent.
The only way that I found to solve my issue is to use script inside term aggregation, but my concern is about performance.
/my_index/_search
{
"size": 0,
"aggs": {
"total": {
"terms": {
"script": {
"lang": "painless",
"source": "params['_source']['my_join']['parent']"
}
}
},
"max_total": {
"max_bucket": {
"buckets_path": "total>_count"
}
}
}
}
Someone knows a more fast way to execute this aggregation avoiding the script?
If the join field wasn't a parent/child I could replace the term aggregation with:
"terms": { "field": "my_field" }
To give more context I add some information about mapping:
I'm using Elastic 7.7.
I also attach a mapping with some sample documents:
{
"mappings": {
"properties": {
"my_join": {
"relations": {
"other": "doc"
},
"type": "join"
},
"reader": {
"type": "keyword"
},
"name": {
"type": "text"
},
"content": {
"type": "text"
}
}
}
}
PUT example/_doc/1
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/2
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/3
{
"content": "abc",
"my_join": {
"name": "doc",
"parent": 1
}
}
PUT example/_doc/4
{
"content": "def",
"my_join": {
"name": "doc"
"parent": 2
}
}
PUT example/_doc/5
{
"content": "def",
"acl_join": {
"name": "doc"
"parent": 1
}
}

ElasticSearch: search inside the array of objects

I have a problem with querying objects in array.
Let's create very simple index, add a type with one field and add one document with array of objects (I use sense console):
PUT /test/
PUT /test/test/_mapping
{
"test": {
"properties": {
"parent": {"type": "object"}
}
}
}
POST /test/test
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Now I want to search by both names "turkey" and "turkey,mugla-province" . The first query works fine:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey"}}}
But the second one returns nothing:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey,mugla-province"}}}
I tried a lot of stuff including:
"parent": {
"type": "nested",
"include_in_parent": true,
"properties": {
"label": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"store": true
}
}
}
But nothing helps. What do I miss?
Here's one way you can do it, using nested docs:
I defined an index like this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"parent": {
"type": "nested",
"properties": {
"label": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Indexed your document:
PUT /test_index/doc/1
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Then either of these queries will return it:
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey"
}
}
}
}
}
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey,mugla-province"
}
}
}
}
}
Here's the code I used:
http://sense.qbox.io/gist/6258f8c9ee64878a1835b3e9ea2b54e5cf6b1d9e
For search multiple terms use the Terms query instead of Term query.
"terms" : {
"tags" : [ "turkey", "mugla-province" ],
"minimum_should_match" : 1
}
There are various ways to construct this query, but this is the simplest and most elegant in the current version of ElasticSearch (1.6)

Resources