Elasticsearch: Parent-child relationship after rollover - elasticsearch

Suppose there is a simple blog index which contains two types: blog and comment. One blog can have multiple comments. The index is created like this
curl -X PUT \
'http://localhost:9200/%3Cblog-%7Bnow%2Fd%7D-000001%3E?pretty=' \
-H 'content-type: application/json' \
-d '{
"mappings": {
"comment": {
"_parent": { "type": "blog" },
"properties": {
"name": { "type": "keyword" },
"comment": { "type": "text" }
}
},
"blog": {
"properties": {
"author": { "type": "keyword" },
"subject": { "type": "text" },
"content": { "type": "text" }
}
}
}
}'
The index %3Cblog-%7Bnow%2Fd%7D-000001%3E is equal to <blog-{now/d}-000001> (see here for more about date math).
We're going to add 'blog-active' alias to this index. This alias is going to be used for storing data.
curl -X POST 'http://localhost:9200/_aliases?pretty=' \
-H 'content-type: application/json' \
-d '{ "actions" : [ { "add" : { "index" : "blog-*", "alias" : "blog-active" } } ] }'
Now if we do the following actions:
1.Add a blog using blog-active alias
curl -X POST http://localhost:9200/blog-active/blog/1 \
-H 'content-type: application/json' \
-d '{
"author": "author1",
"subject": "subject1",
"content": "content1"
}'
2.Add a comment to the blog
curl -X POST \
'http://localhost:9200/blog-active/comment/1?parent=1' \
-H 'content-type: application/json' \
-d '{
"name": "commenter1",
"comment": "new comment1"
}'
3.Do a rollover with max_docs = 2
curl -X POST \
http://localhost:9200/blog-active/_rollover \
-H 'content-type: application/json' \
-d '{
"conditions": {
"max_docs": 2
},
"mappings": {
"comment": {
"_parent": { "type": "blog" },
"properties": {
"name": { "type": "keyword" },
"comment": { "type": "text" }
}
},
"blog": {
"properties": {
"author": { "type": "keyword" },
"subject": { "type": "text" },
"content": { "type": "text" }
}
}
}
}'
4.And add another comment to the blog
curl -X POST \
'http://localhost:9200/blog-active/comment/1?parent=1' \
-H 'content-type: application/json' \
-d '{
"name": "commenter2",
"comment": "new comment2"
}'
Now if we search all blog indices for all comments on 'author1' blogs with (blog-%2A is blog-*)
curl -X POST \
http://localhost:9200/blog-%2A/comment/_search \
-H 'content-type: application/json' \
-d '{
"query": {
"has_parent" : {
"query" : {
"match" : { "author" : { "query" : "author1" } }
},
"parent_type" : "blog"
}
}
}'
the result only contains first comment.
This is due to the fact that second comment is in the second index which does not have parent blog document in itself. So it doesn't know about the author of the blog.
So, my question is how do I approach parent-child relations when rollover is used?
Is the relationship even possible in that case?
Similar question: ElasticSearch parent/child on different indexes

All documents that form part of a parent-child relationship need to live in the same index, more preciously same shard. Therefore it's not possible to have parent-child relationship if rollover is used, since it creates new indices.
One solution for the problem above could be to denormalize data by adding filed blog_author and blog_id in comment type. The mapping in that case will look like this (notice that parent-child relationship has been removed):
"mappings": {
"comment": {
"properties": {
"blog_id": { "type": "keyword" },
"blog_author": { "type": "keyword" },
"name": { "type": "keyword" },
"comment": { "type": "text" }
}
},
"blog": {
"properties": {
"author": { "type": "keyword" },
"subject": { "type": "text" },
"content": { "type": "text" }
}
}
}
and the query to fetch comments by blog author is:
curl -X POST \
http://localhost:9200/blog-%2A/comment/_search \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"query": {
"match": {
"blog_author": "user1"
}
}
}'

Related

Copy field value to a new field in existing index

I have a document that has the structure with an field object with a nested field internally. The nested field is responsible for storing all interactions that occurred in an internal communication.
It happens that I need to create a new field inside the nested field, with a new type that will now be used to store the old field with a new parser.
How can I copy the data from the old field to the new field inside the nested field?
My document:
curl -XPUT 'localhost:9200/problems?pretty' -H 'Content-Type: application/json' -d '
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"problem": {
"properties": {
"problemid": {
"type": "long"
},
"subject": {
"type": "text",
"index": true
},
"usermessage": {
"type": "object",
"properties": {
"content": {
"type": "nested",
"properties": {
"messageid": {
"type": "long",
"index": true
},
"message": {
"type": "text",
"index": true
}
}
}
}
}
}
}
}
}'
My New Field:
curl -XPUT 'localhost:9200/problems/_mapping/problem?pretty' -H 'Content-Type: application/json' -d '
{
"properties": {
"usermessage": {
"type": "object",
"properties": {
"content": {
"type": "nested",
"properties": {
"message_accents" : {
"type" : "text",
"analyzer" : "ignoreaccents"
}
}
}
}
}
}
}
'
Data Example:
{
"problemid": 1,
"subject": "Test",
"usermessage": [
{
"messageid": 1
"message": "Hello"
},
{
"messageid": 2
"message": "Its me"
},
]
}'
My script to copy fields:
curl -XPOST 'localhost:9200/problems/_update_by_query' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {
}
},
"script": "ctx._source.usermessage.content.message_accents = ctx._source.usermessage.content.message"
}'
I tried the code below but it didn't work, it returns an error.
curl -XPOST 'localhost:9200/problems/_update_by_query' -H 'Content-Type: application/json' -d '
{
"query": {
"match_all": {
}
},
"script": "ctx._source.usermessage.content.each { elm -> elm.message_accents = elm.message }"
}
'
Error:
"script":"ctx._source.usermessage.content.each { elm -> elm.message_accents = elm.message }","lang":"painless","caused_by":{"type":"illegal_argument_exception","reason":"unexpected token ['{'] was expecting one of [{, ';'}]."}},"status":500}%

Elassandra: UDT List Match Query- No Results

I am using Elassandra. In Cassandra, I have a UDT:
CREATE TYPE test.entity_attributes (
attribute_key text,
attribute_value text
);
It is used in table
CREATE TABLE test.attributes_test (
id text PRIMARY KEY,
attr list<frozen<entity_attributes>>
)
I mapped the attributes_test using:
curl --location --request PUT 'localhost:9200/attr_index' \
--header 'Content-Type: application/json' \
--data-raw '{
"settings": { "keyspace": "test" },
"mappings": {
"attributes_test" : {
"discover":".*"
}
}
}'
(copied from postman)
It returns the following as mapping:
{
"attr_index": {
"aliases": {},
"mappings": {
"attributes_test": {
"properties": {
"attr": {
"type": "nested",
"cql_udt_name": "entity_attributes",
"properties": {
"attribute_key": {
"type": "keyword",
"cql_collection": "singleton"
},
"attribute_value": {
"type": "keyword",
"cql_collection": "singleton"
}
}
},
"id": {
"type": "keyword",
"cql_collection": "singleton",
"cql_partition_key": true,
"cql_primary_key_order": 0
}
}
}
},
"settings": {
"index": {
"keyspace": "test",
"number_of_shards": "1",
"provided_name": "attr_index",
"creation_date": "1615291749532",
"number_of_replicas": "0",
"uuid": "Oua1ACLbRvCATC-kcGPoQg",
"version": {
"created": "6020399"
}
}
}
}
}
This is what I have in the table:
id | attr
----+----------------------------------------------------------------------------------------------
2 | [{attribute_key: 'abc', attribute_value: '2'}, {attribute_key: 'def', attribute_value: '1'}]
1 | [{attribute_key: 'abc', attribute_value: '1'}]
The problem now is, when I run the following query, it does not return any result.
curl --location --request POST 'localhost:9200/attr_index/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"attr.attribute_key": "abc"
}
}
}'
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html - describes the way to search in nested objects.
How can I search in the list of nested objects?
It was a mistake in my query. The correct query would be:
{
"query": {
"nested": {
"path": "attr",
"query": {
"match": {
"attr.attribute_key": "abc"
}
}
}
}
}

Elastic Search: return matching parents with matched/unmatched childs

I am using elastic search 7.8.1 and have used parent-child method to index the documents. My requirement is to search both parent and child documents, but return response in a format that parent document is the main document and child document is a field within the parent document. i.e
1) If the child matches, I wish to return parent & child in a document. I am able to achieve this using has_child and inner_hits.
2) If the parent matches the query, I wish to return parent and child in a document even if the child does not matches. (Not sure how to achieve this)
# This is the parent child relationship mapping in index
*curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"my_id": {
"type": "keyword"
},
"my_join_field": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}
'*
Below is the query I am trying to use, but it does not return the child when the parent matches:
*curl -X POST "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "answer",
"query": {
"match": {
"my_id": "4"
}
},
"inner_hits": {
"size": 1
}
}
},
{
"match": {
"my_id": "1"
}
}
]
}
}
}'*
#Parent docs curl -X PUT "localhost:9200/my-index-000001/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "1", "text": "This is a question", "my_join_field": "question" } ' curl -X PUT "localhost:9200/my-index-000001/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "2", "text": "This is another question", "my_join_field": "question" } '
#Child docs curl -X PUT "localhost:9200/my-index-000001/_doc/3?routing=1&refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "3", "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } } ' curl -X PUT "localhost:9200/my-index-000001/_doc/4?routing=1&refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "4", "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } } '
How can I search both parent and child, but return child as a field in parent doc. Thanks in advance.

How to index some fields of an object

I have to log a dynamic object, but I'm interested only to index some fields (not all), but when I configure this behaviour I can't search for those fields.
Here an example of what I'm doing with Elastic 6.x:
curl --request PUT 'http://localhost:9200/manuel-prova?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"log": {
"properties": {
"hello": {
"type": "object",
"enabled": false,
"properties": {
"my-api-key": {
"type": "text"
}
}
},
"check": {
"type": "boolean"
}
}
}
}
}'
Then I insert the data:
curl --request POST 'http://localhost:9200/manuel-prova/log?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"hello": {
"foo": "bar",
"my-api-key": "QWERTY"
},
"check": true
}'
Finally, I tried to query:
curl --request POST 'http://localhost:9200/manuel-prova/_search?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": [
{ "exists": { "field": "hello.my-api-key" } }
]
}
}
}'
This query doesn't work.
If I change to { "exists": { "field": "check" } } for example, it works.
Do you have any suggestion?
This is because your hello object is defined with enabled: false, which makes ES ignore the content of the field altogether, and hence, it is not searchable.
In order to fix that you need to remove enabled: false, like below, and it will work:
curl --request PUT 'http://localhost:9200/manuel-prova?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"log": {
"dynamic": false, <-- add this
"properties": {
"hello": {
"type": "object",
"properties": { <-- remove enabled: false
"my-api-key": {
"type": "text"
}
}
},
"check": {
"type": "boolean"
}
}
}
}
}'

How to fetch unique geo codes from Elasticsearch?

I'm new to Elasticsearch. I've created the INDEX & inserted some documents by following CURL commands.
curl -XPUT 'localhost:9200/museums?pretty' -H 'Content-Type: application/json' -d'
{
"mappings": {
"doc": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
'
curl -XPOST 'localhost:9200/museums/doc/_bulk?refresh&pretty' -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"location": "52.374081,4.912350", "name": "NEMO Science Museum"}
{"index":{"_id":2}}
{"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"}
{"index":{"_id":3}}
{"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"}
{"index":{"_id":4}}
{"location": "51.222900,4.405200", "name": "Letterenhuis"}
{"index":{"_id":5}}
{"location": "48.861111,2.336389", "name": "Musée du Louvre"}
{"index":{"_id":6}}
{"location": "48.860000,2.327000", "name": "Musée d\u0027Orsay"}
{"index":{"_id":7}}
{"location": "52.374081,4.912350", "name": "NEMO7 Science Museum"}
{"index":{"_id":8}}
{"location": "52.369219,4.901618", "name": "Museum8 Het Rembrandthuis"}
{"index":{"_id":9}}
{"location": "52.371667,4.914722", "name": "Nederlands9 Scheepvaartmuseum"}
{"index":{"_id":10}}
{"location": "51.222900,4.405200", "name": "Letterenhuis10"}
{"index":{"_id":11}}
{"location": "48.861111,2.336389", "name": "Musée11 du Louvre"}
{"index":{"_id":12}}
{"location": "48.860000,2.327000", "name": "Musée12 d\u0027Orsay"}
'
If you'll see the curl commands I've made some duplicate documents & inserted those also. Now, I want to fetch all documents having UNIQUE GEO CODES & apply SORT(ASC) on that.
I got one sample CURL command like following.
curl -XPOST 'localhost:9200/museums/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
"aggs" : {
"rings_around_amsterdam" : {
"geo_distance" : {
"field" : "location",
"origin" : "52.3760, 4.894",
"ranges" : [
{ "to" : 100000 },
{ "from" : 100000, "to" : 300000 },
{ "from" : 300000 }
]
}
}
}
}
'
But, it uses RANGE on that. I just want to fetch only UNIQUE GEO CODES & SORT those in ascending order. I googled also but, whatever I'm getting to fetch UNIQUE documents are works on only TEXT/NUMERIC type documents. Not on GEO CODES type document.
Need some help.
Try :
curl -XPOST 'localhost:9200/museums/_search?size=0&pretty' -H 'Content-Type: application/json' -d'
{
"size" : 0,
"aggs": {
"distinct_geo_distance" : {
"cardinality" : {
"field" : "location"
}
}
}
}

Resources