Insert child document in Elasticsearch update_by_query - elasticsearch

i have a parent-child model in Elasticsearch and i would like to add a new child to a parent based on a query. Naturally the update_by_user call is a candidate for this. However i struggle to insert a new child document in the script section. Does anyone has a suggestion on how to do this? Manual inserts would work, but performance would very slow.
Example scenario: Two parents, one is married, one is single. The married one already has a child. The update should now add a new child to all married parents.
Index mappings
PUT /family
{
"mappings": {
"properties": {
"familyRelation": {
"type": "join",
"relations": {
"parent": "child"
}
},
"name": {
"type": "keyword"
},
"status": {
"type": "married"
}
}
}
}
Parent document:
PUT /family/_doc/dave
{
"name": "dave",
"familyRelation": "parent",
"status": "married"
}
PUT /family/_doc/tom
{
"name": "tom",
"familyRelation": "parent",
"status": "single"
}
Child document:
PUT /family/_doc/amelie/?routing=dave
{
"familyRelation": {
"name": "child",
"parent": "dave"
},
"name": "amelie"
}
An (incomplete) update by query would look like this:
POST /family/_update_by_query
{
"script": {
"source": "???",
"lang": "painless",
"params": {
"child": {
"familyRelation": {
"name": "child",
"parent": "Dave"
},
"name": "Jonathan"
}
}
},
"query": {
"term": {
"status": "married"
}
}
}

Related

Elasticsearch - nested types vs collapse/aggs

I have a use case where I need to find the latest data based on some fields.
The fields are:
category.name
category.type
createdAt
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
The problem is that category.* is a nested field and I can't aggs/collapse these fields because ES doesn't support it.
Mapping:
PUT data
{
"mappings": {
"properties": {
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
},
"category": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
},
"approved": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
Data:
POST data/_create/1
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
POST data/_create/2
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
POST data/_create/3
{
"category": [
{
"name": "Rick J.",
"level": "B"
}
],
"createdBy": "Rick",
"createdAt": "2022-03-02 02:09:27.527+0200",
"approved": "no"
}
I'm looking for either a search query that can handle that in an acceptable performant way, or a new object design without nested type where I could take advantage of aggs/collapse feature.
Any suggestion will be really appreciated.
About your first question,
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
I believe you can do something along those lines:
GET /72088168/_search
{
"query": {
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"match": {
"category.name": "John G."
}
},
{
"match": {
"category.level": "A"
}
}
]
}
}
}
},
"sort": [
{
"createdAt": {
"order": "desc"
}
}
],
"size":1
}
For the 2nd matter, it really depends on what you are aiming to do. could merge category.name and category.level in the same field. Such that you document would look like:
{
"category": ["John G. A","Chris T. A"],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
No more nested needed. Although I agree it feels like using tape to fix your issue.

Term aggregation on ElasticSearch join

I would like to perform an aggregation on a join relation using ElasticSearch 7.7.
I need to know how many children I have for each parent.
The only way that I found to solve my issue is to use script inside term aggregation, but my concern is about performance.
/my_index/_search
{
"size": 0,
"aggs": {
"total": {
"terms": {
"script": {
"lang": "painless",
"source": "params['_source']['my_join']['parent']"
}
}
},
"max_total": {
"max_bucket": {
"buckets_path": "total>_count"
}
}
}
}
Someone knows a more fast way to execute this aggregation avoiding the script?
If the join field wasn't a parent/child I could replace the term aggregation with:
"terms": { "field": "my_field" }
To give more context I add some information about mapping:
I'm using Elastic 7.7.
I also attach a mapping with some sample documents:
{
"mappings": {
"properties": {
"my_join": {
"relations": {
"other": "doc"
},
"type": "join"
},
"reader": {
"type": "keyword"
},
"name": {
"type": "text"
},
"content": {
"type": "text"
}
}
}
}
PUT example/_doc/1
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/2
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/3
{
"content": "abc",
"my_join": {
"name": "doc",
"parent": 1
}
}
PUT example/_doc/4
{
"content": "def",
"my_join": {
"name": "doc"
"parent": 2
}
}
PUT example/_doc/5
{
"content": "def",
"acl_join": {
"name": "doc"
"parent": 1
}
}

Parent Child Relation In Elastic Search 7.5

I am new to "Elastic Search" and currently trying to understand how does ES maintain "Parent-Child" relationship. I started with the following article:
https://www.elastic.co/blog/managing-relations-inside-elasticsearch
But the article is based on old version of ES and I am currently using ES 7.5 which states that:
The _parent field has been removed in favour of the join field.
Now I am currently following this article:
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/parent-join.html
However, I am not able to get the desired result.
I have a scenario in which i have two indices "Person" and "Home". Each "Person" can have multiple "Home" which is basically a one-to-many relation. Problem is when I query to fetch all homes whose parent is "XYZ" person the answer is null.
Below are my indexes structure and search query:
Person Index:
Request URL: http://hostname/person
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Home Index:
Request URL: http://hostname/home
{
"mappings": {
"properties": {
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Adding data in person Index
Request URL: http://hostname/person/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
Adding data in home index
Request URL: http://hostname/home/_doc/2?routing=1&refresh
{
"state": "ontario",
"person_home": {
"name": "home",
"parent": "1"
}
}
Query to fetch data: (To fetch all the records who parent is person id "1")
Request URL: http://hostname/person/_search
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"name": "shujaat"
}
}
}
}
}
OR
{
"query": {
"has_parent": {
"parent_type": "person",
"query": {
"match": {
"_id": "1"
}
}
}
}
}
Response:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
I am unable to understand what I am missing here or what is wrong with the above mentioned query as it not returning any data.
You should put the parent and child documents in the same index:
The join datatype is a special field that creates parent/child
relation within documents of the same index.
So the mapping would look like the following:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"name": {
"type": "text"
},
"state": {
"type": "text"
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Notice that it has both fields from your original person and home indexes.
The rest of your code should work just fine. Try inserting the person and home documents into the same index person_home and use the queries as you posted in the question.
What if person and home objects have overlapping field names?
Let's say, both object types have got field name but we want to index and query them separately. In this case we can come up with a mapping like this:
PUT http://hostname/person_home
{
"mappings": {
"properties": {
"person": {
"properties": {
"name": {
"type": "text"
}
}
},
"home": {
"properties": {
"name": {
"type": "keyword"
},
"state": {
"type": "text"
}
}
},
"person_home": {
"type": "join",
"relations": {
"person": "home"
}
}
}
}
}
Now, we should change the structure of the objects themselves:
PUT http://hostname/person_home/_doc/1
{
"name": "shujaat",
"person_home": {
"name": "person"
}
}
PUT http://hostname/person_home/_doc/2?routing=1&refresh
{
"home": {
"name": "primary",
"state": "ontario"
},
"person_home": {
"name": "home",
"parent": "1"
}
}
If you have to migrate old data from the two old indexes into a new merged one, reindex API may be of use.

Elasticsearch Nested Parent-Child mapping

I'd like to map the following structure:
- I have blog posts
- Blog posts can have comments
- Comments can have replies (which are also comments), so it should be a recursive datastructure
POST -----*--> COMMENT
COMMENT -----*---> COMMENT
Here's what I tried:
mappings: {
"comment": {
"properties": {
"content": { "type": "string" },
"replies": { "type": "comment" }
}
},
"post": {
"properties": {
"comments": {
"type": "comment"
}
}
}
}
Of course it's not working. How can I achieve this?
You're trying to declare the types as you would do in OO programming, that's not how it works in ES. You need to use parent-child relationships like below, i.e. post doesn't have a field called comments but the comment mapping type has a _parent meta field referencing the post parent type.
Also in order to model replies I suggest to simply have another field called in_reply_to which would contain the id of the comment that the reply relates to. Much easier that way!
PUT blogs
{
"mappings": {
"post": {
"properties": {
"title": { "type": "string"}
}
},
"comment": {
"_parent": {
"type": "post"
},
"properties": {
"id": {
"type": "long"
}
"content": {
"type": "string"
},
"in_reply_to": {
"type": "long"
}
}
}
}
}
mappings: {
"post": {
"properties": {
"content": { "type": "string" },
"comment": {
"properties" : {
"content": { "type": "string" },
"replies": {
"properties" : {
"content": { "type": "string" }
}
}
}
}
}

ElasticSearch: search inside the array of objects

I have a problem with querying objects in array.
Let's create very simple index, add a type with one field and add one document with array of objects (I use sense console):
PUT /test/
PUT /test/test/_mapping
{
"test": {
"properties": {
"parent": {"type": "object"}
}
}
}
POST /test/test
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Now I want to search by both names "turkey" and "turkey,mugla-province" . The first query works fine:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey"}}}
But the second one returns nothing:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey,mugla-province"}}}
I tried a lot of stuff including:
"parent": {
"type": "nested",
"include_in_parent": true,
"properties": {
"label": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"store": true
}
}
}
But nothing helps. What do I miss?
Here's one way you can do it, using nested docs:
I defined an index like this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"parent": {
"type": "nested",
"properties": {
"label": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Indexed your document:
PUT /test_index/doc/1
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Then either of these queries will return it:
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey"
}
}
}
}
}
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey,mugla-province"
}
}
}
}
}
Here's the code I used:
http://sense.qbox.io/gist/6258f8c9ee64878a1835b3e9ea2b54e5cf6b1d9e
For search multiple terms use the Terms query instead of Term query.
"terms" : {
"tags" : [ "turkey", "mugla-province" ],
"minimum_should_match" : 1
}
There are various ways to construct this query, but this is the simplest and most elegant in the current version of ElasticSearch (1.6)

Resources