Related
I have data in Elasticsearch in the below format -
"segments": [
{"id": "ABC", "value":123},
{"id": "PQR", "value":345},
{"id": "DEF", "value":567},
{"id": "XYZ", "value":789},
]
I want to retrieve all segments where id is "ABC" or "DEF".
I looked up the docs (https://www.elastic.co/guide/en/elasticsearch/reference/7.9/query-dsl-nested-query.html) and few examples on YouTube but the all look to retrieve only a single object while I want to retrieve more than 1.
Is there a way to do this?
You can use nested query with inner hits as shown here.
I hope your index mapping is looks like below and segments field is define as nested
"mappings": {
"properties": {
"segments": {
"type": "nested",
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "long"
}
}
}
}
}
You can use below Query:
{
"_source" : false,
"query": {
"nested": {
"path": "segments",
"query": {
"terms": {
"segments.id.keyword": [
"ABC",
"DEF"
]
}
},
"inner_hits": {}
}
}
}
Response:
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "73895503",
"_id": "TmM8iYMBrWOLJcwdvQGG",
"_score": 1,
"inner_hits": {
"segments": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "73895503",
"_id": "TmM8iYMBrWOLJcwdvQGG",
"_nested": {
"field": "segments",
"offset": 0
},
"_score": 1,
"_source": {
"id": "ABC",
"value": 123
}
},
{
"_index": "73895503",
"_id": "TmM8iYMBrWOLJcwdvQGG",
"_nested": {
"field": "segments",
"offset": 2
},
"_score": 1,
"_source": {
"id": "DEF",
"value": 567
}
}
]
}
}
}
}
]
}
I have products in my index. Documents are basically structured like these:
{
"_id": "product",
"_source": {
...
"type": "product",
"id": 1,
"mainTaxon": {
"name": "T-SHIRT",
},
"attributes": [
{
"code": "name",
"name": "Name",
"value": [
"BANANA T-SHIRT"
],
"score": 50
},
]
}
},
{
"_id": "product",
"_source": {
...
"type": "product",
"id": 2,
"mainTaxon": {
"name": "JEANS",
},
"attributes": [
{
"code": "name",
"name": "Name",
"value": [
"BANANA JEANS"
],
"score": 50
},
]
}
}
}
When I search for 'BANANA' I would prioritize products with mainTaxon different from JEANS. So, every product with the mainTaxon name T_SHIRT or something else would be listed before products with mainTaxon JEANS.
You can use boosting query to prioritize documents
{
"query": {
"boosting": {
"positive": {
"match": {
"attributes.value": "banana"
}
},
"negative": {
"match": {
"mainTaxon.name": "JEANS"
}
},
"negative_boost": 0.5
}
}
}
Search Result will be
"hits": [
{
"_index": "67164768",
"_type": "_doc",
"_id": "1",
"_score": 0.5364054,
"_source": {
"type": "product",
"id": 1,
"mainTaxon": {
"name": "T-SHIRT"
},
"attributes": [
{
"code": "name",
"name": "Name",
"value": [
"BANANA T-SHIRT"
],
"score": 50
}
]
}
},
{
"_index": "67164768",
"_type": "_doc",
"_id": "2",
"_score": 0.32743764,
"_source": {
"type": "product",
"id": 2,
"mainTaxon": {
"name": "JEANS"
},
"attributes": [
{
"code": "name",
"name": "Name",
"value": [
"BANANA JEANS"
],
"score": 50
}
]
}
}
]
Imagine I have a document like this:
{
"_index": "bank-accounts",
"_type": "_doc",
"_id": "1",
"_version": 4,
"_seq_no": 3,
"_primary_term": 1,
"found": true,
"_source": {
"id": 1,
"balance": 140,
"transactions": [
{
"id": "42f52474-a49b-4707-86e4-e983efb4ab31",
"type": "Deposit",
"amount": 100
},
{
"id": "3f8396a3-d747-4a4c-8926-cdcedea6b5c3",
"type": "Deposit",
"amount": 50
},
{
"id": "5693585d-6356-4d1a-8d7b-cac5d0dab39f",
"type": "Withdraw",
"amount": 10
}
],
"accountCreatedAt": 1614029062764
}
}
I do want to return only the transactions array in a query.
How would I do this within Elasticsearch? Is this even possible? I've achieved a result using fields[ "transactions.*" ], but it returns each of the fields in separate arrays:
{
...
"hits": [
{
"_index": "bank-accounts",
"_type": "_doc",
"_id": "1",
"_score": 1,
"fields": {
"transactions.id": [
"42f52474-a49b-4707-86e4-e983efb4ab31",
"3f8396a3-d747-4a4c-8926-cdcedea6b5c3",
"5693585d-6356-4d1a-8d7b-cac5d0dab39f"
],
"transactions.amount": [
100,
50,
10
],
"transactions.type": [
"Deposit",
"Deposit",
"Withdraw"
],
...
}
}
]
}
}
I mean, I could very well be using this, but I want something more simple to handle. I expect to get something like this:
*I have to use the document id in my search
{
...
"hits": [
{
"_index": "bank-accounts",
"_type": "_doc",
"_id": "1",
"_score": 3,
"transactions": [
{
"id": "42f52474-a49b-4707-86e4-e983efb4ab31",
"type": "Deposit",
"amount": 100
},
{
"id": "3f8396a3-d747-4a4c-8926-cdcedea6b5c3",
"type": "Deposit",
"amount": 50
},
{
"id": "5693585d-6356-4d1a-8d7b-cac5d0dab39f",
"type": "Withdraw",
"amount": 10
},
....
]
}
]
}
}
Is this possible to achieve?
If you only want to return the transactions array (as you have not mentioned any query condition, on which you need to search), you can achieve that using source filtering.
Adding a working example
Index Mapping:
{
"mappings": {
"properties": {
"transactions": {
"type": "nested"
}
}
}
}
Index Data:
{
"id": 1,
"balance": 140,
"transactions": [
{
"id": "42f52474-a49b-4707-86e4-e983efb4ab31",
"type": "Deposit",
"amount": 100
},
{
"id": "3f8396a3-d747-4a4c-8926-cdcedea6b5c3",
"type": "Deposit",
"amount": 50
},
{
"id": "5693585d-6356-4d1a-8d7b-cac5d0dab39f",
"type": "Withdraw",
"amount": 10
}
],
"accountCreatedAt": 1614029062764
}
Search Query:
{
"_source": [
"transactions.*"
]
}
Search Result:
"hits": [
{
"_index": "66324257",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"transactions": [
{
"amount": 100,
"id": "42f52474-a49b-4707-86e4-e983efb4ab31",
"type": "Deposit"
},
{
"amount": 50,
"id": "3f8396a3-d747-4a4c-8926-cdcedea6b5c3",
"type": "Deposit"
},
{
"amount": 10,
"id": "5693585d-6356-4d1a-8d7b-cac5d0dab39f",
"type": "Withdraw"
}
]
}
}
]
Can somebody help me with Alerting Via X-Pack for Energy monitoring system project? The main problem here is I can't collect the 'Value' data from the database, as I want to compare it later with the upper and the lower threshold.
So here is the index:
PUT /test-1
{
"mappings": {
"Test1": {
"properties": {
"Value": {
"type": "integer"
},
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ"
},
"UpperThreshold": {
"type": "integer"
},
"LowerThreshold": {
"type": "integer"
}
}
}
}
}
Here is the example of the input:
POST /test-1/Test1
{
"Value": "500",
"date": "2017-06-13T16:20:00.000Z",
"UpperThreshold":"450",
"LowerThreshold": "380"
}
This is my alerting code
{
"trigger": {
"schedule": {
"interval": "10s"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"logs"
],
"types": [],
"body": {
"query": {
"match": {
"message": "error"
}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.total": {
"gt": 0
}
}
},
"actions": {
"send_email": {
"email": {
"profile": "standard",
"to": [
"<account#gmail.com>"
],
"subject": "Watcher Notification",
"body": {
"text": "{{ctx.payload.hits.total}} error logs found"
}
}
}
}
}
Here is the response I got from the alerting plugin
{
"watch_id": "Alerting-Test",
"state": "execution_not_needed",
"_status": {
"state": {
"active": true,
"timestamp": "2017-07-26T15:27:35.497Z"
},
"last_checked": "2017-07-26T15:27:38.625Z",
"actions": {
"logging": {
"ack": {
"timestamp": "2017-07-26T15:27:35.497Z",
"state": "awaits_successful_execution"
}
}
}
},
"trigger_event": {
"type": "schedule",
"triggered_time": "2017-07-26T15:27:38.625Z",
"schedule": {
"scheduled_time": "2017-07-26T15:27:38.175Z"
}
},
"input": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"test-1"
],
"types": [
"Test1"
],
"body": {
"query": {
"match_all": {}
}
}
}
}
},
"condition": {
"compare": {
"ctx.payload.hits.hits.0.Value": {
"gt": 450
}
}
},
"metadata": {
"name": "Alerting-Test"
},
"result": {
"execution_time": "2017-07-26T15:27:38.625Z",
"execution_duration": 0,
"input": {
"type": "search",
"status": "success",
"payload": {
"_shards": {
"total": 5,
"failed": 0,
"successful": 5
},
"hits": {
"hits": [
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-22T12:00:00.000Z",
"LowerThreshold": "380",
"Value": "350",
"UpperThreshold": "450"
},
"_id": "AV1-1P3lArbJ1tbnct4e",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-22T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "4100",
"UpperThreshold": "450"
},
"_id": "AV1-1Sq0ArbJ1tbnct4v",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-24T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "450",
"UpperThreshold": "450"
},
"_id": "AV1-1eLJArbJ1tbnct6G",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-23T00:00:00.000Z",
"LowerThreshold": "380",
"Value": "400",
"UpperThreshold": "450"
},
"_id": "AV1-1VUzArbJ1tbnct5A",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-23T12:00:00.000Z",
"LowerThreshold": "380",
"Value": "390",
"UpperThreshold": "450"
},
"_id": "AV1-1X4FArbJ1tbnct5R",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-23T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "390",
"UpperThreshold": "450"
},
"_id": "AV1-1YySArbJ1tbnct5T",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-26T00:00:00.000Z",
"LowerThreshold": "380",
"Value": "4700",
"UpperThreshold": "450"
},
"_id": "AV1-1mflArbJ1tbnct67",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-26T06:00:00.000Z",
"LowerThreshold": "380",
"Value": "390",
"UpperThreshold": "450"
},
"_id": "AV1-1oluArbJ1tbnct7M",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-21T12:00:00.000Z",
"LowerThreshold": "380",
"Value": "400",
"UpperThreshold": "450"
},
"_id": "AV1-1IrZArbJ1tbnct3r",
"_score": 1
},
{
"_index": "test-1",
"_type": "Test1",
"_source": {
"date": "2017-07-21T18:00:00.000Z",
"LowerThreshold": "380",
"Value": "440",
"UpperThreshold": "450"
},
"_id": "AV1-1LwzArbJ1tbnct38",
"_score": 1
}
],
"total": 20,
"max_score": 1
},
"took": 1,
"timed_out": false
},
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"test-1"
],
"types": [
"Test1"
],
"body": {
"query": {
"match_all": {}
}
}
}
}
},
"condition": {
"type": "compare",
"status": "success",
"met": false,
"compare": {
"resolved_values": {
**"ctx.payload.hits.hits.0.Value": null**
}
}
},
"actions": []
},
"messages": []
}
Really appreciate for your help!!
Although the Lucene logic structure, I'm trying to make my nested fields to be highlighted when some search result is present in their content.
Here is the explanation from Elasticsearch documentation (mapping nested type`)
Internal Implementation
Internally, nested objects are indexed as additional documents, but, since they can be guaranteed to be indexed within the same "block", it allows for extremely fast joining with parent docs.
Those internal nested documents are automatically masked away when doing operations against the index (like searching with a match_all query), and they bubble out when using the nested query.
Because nested docs are always masked to the parent doc, the nested docs can never be accessed outside the scope of the nested query. For example stored fields can be enabled on fields inside nested objects, but there is no way of retrieving them, since stored fields are fetched outside of the nested query scope.
0. In my case
I have an Elasticsearch index containing a mapping like the following:
{
"my_documents": {
"dynamic_date_formats": [
"dd.MM.yyyy",
"yyyy-MM-dd",
"yyyy-MM-dd HH:mm:ss"
],
"index_analyzer": "Analyzer2_index",
"search_analyzer": "Analyzer2_search_decompound",
"_timestamp": {
"enabled": true
},
"properties": {
"identifier": {
"type": "string"
},
"description": {
"type": "multi_field",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string"
}
}
},
"files": {
"type": "nested",
"include_in_root": true,
"properties": {
"content": {
"type": "string",
"include_in_root": true
}
}
},
"and then some other": "normal string fields"
}
}
}
I'm trying to execute a query like this:
{
"size": 100,
"query": {
"bool": {
"should": [
{
"nested": {
"path": "files",
"query": {
"bool": {
"should": {
"match": {
"content": {
"query": "burpcontrol",
"minimum_should_match": "85%"
}
}
}
}
}
}
},
{
"match": {
"description": {
"query": "burpcontrol",
"minimum_should_match": "85%"
}
}
},
{
"match": {
"identifier": {
"query": "burpcontrol",
"minimum_should_match": "85%"
}
}
} ]
}
},
"highlight": {
"pre_tags": [
"<span style=\"background-color: yellow\">"
],
"post_tags": [
"</span>"
],
"order": "score",
"no_match_size": 100,
"fragment_size": 50,
"number_of_fragments": 3,
"require_field_match": true,
"fields": {
"files.content": {},
"description": {},
"identifier": {}
}
}
}
The problem I have are:
1. require_field_match
If I use "require_field_match": false I obtain that, even if highlighting doesn't work on nested fields, the search term is highlighted anyway in ALL the fields.
This is the solution I'm actually using, but the performances are horrible. For 50 documents my query needs 25secs. 100 documents about 50secs. 10 documents 5secs.
And if I remove the nested field from the highlighting everything works fast as light!
2 .include_in_root
I would like to have a flattened version of my nested fields (so to store them as normal objects/fields.
To do this I should specify
"files": { "type": "nested", "include_in_root": true, ...
but I don't know why, after reindexing, I cannot see any additional flattened field in the document root (while I was expecting something like "files.content":["content1", "content2", "..."]).
If it would work it would be instead possible to access (in the flattened field) the content of the nested field, and perform the highlighting on it.
Do you know if is it possible to achieve a good (and performant) highlighting on nested fields or, at least, suggest me why my query is so slow? (I already optimised the fragments)
There are a number of things you can do here, with a parent/child relationship. I'll go over a few, and hopefully that will lead you in the right direction; it will still take lots of testing to figure out whether this solution is going to be more performant for you. Also, I left out a few of the details of your setup, for clarity. Please forgive the long post.
I set up a parent/child mapping as follows:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"parent_doc": {
"properties": {
"identifier": {
"type": "string"
},
"description": {
"type": "string"
}
}
},
"child_doc": {
"_parent": {
"type": "parent_doc"
},
"properties": {
"content": {
"type": "string"
}
}
}
}
}
Then added some test docs:
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"parent_doc","_id":1}}
{"identifier": "first", "description":"some special text"}
{"index":{"_index":"test_index","_type":"child_doc","_parent":1}}
{"content":"text that is special"}
{"index":{"_index":"test_index","_type":"child_doc","_parent":1}}
{"content":"text that is not"}
{"index":{"_index":"test_index","_type":"parent_doc","_id":2}}
{"identifier": "second", "description":"some different text"}
{"index":{"_index":"test_index","_type":"child_doc","_parent":2}}
{"content":"different child text, but special"}
{"index":{"_index":"test_index","_type":"parent_doc","_id":3}}
{"identifier": "third", "description":"we don't want this parent"}
{"index":{"_index":"test_index","_type":"child_doc","_parent":3}}
{"content":"or this child"}
If I'm understanding your specs correctly, we would want a query for "special" to return every one of these documents except the last two (correct me if I'm wrong). We want docs that match the text, have a child that matches the text, or have a parent that matches the text.
We can get back parents that match the query like this:
POST /test_index/parent_doc/_search
{
"query": {
"match": {
"description": "special"
}
},
"highlight": {
"fields": {
"description": {},
"identifier": {}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.1263815,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "1",
"_score": 1.1263815,
"_source": {
"identifier": "first",
"description": "some special text"
},
"highlight": {
"description": [
"some <em>special</em> text"
]
}
}
]
}
}
And we can get back children that match the query like this:
POST /test_index/child_doc/_search
{
"query": {
"match": {
"content": "special"
}
},
"highlight": {
"fields": {
"content": {}
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.92364895,
"hits": [
{
"_index": "test_index",
"_type": "child_doc",
"_id": "geUFenxITZSL7epvB568uA",
"_score": 0.92364895,
"_source": {
"content": "text that is special"
},
"highlight": {
"content": [
"text that is <em>special</em>"
]
}
},
{
"_index": "test_index",
"_type": "child_doc",
"_id": "IMHXhM3VRsCLGkshx52uAQ",
"_score": 0.80819285,
"_source": {
"content": "different child text, but special"
},
"highlight": {
"content": [
"different child text, but <em>special</em>"
]
}
}
]
}
}
We can get back parents that match the text and children that match the text like this:
POST /test_index/parent_doc,child_doc/_search
{
"query": {
"multi_match": {
"query": "special",
"fields": ["description", "content"]
}
},
"highlight": {
"fields": {
"description": {},
"identifier": {},
"content": {}
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1.1263815,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "1",
"_score": 1.1263815,
"_source": {
"identifier": "first",
"description": "some special text"
},
"highlight": {
"description": [
"some <em>special</em> text"
]
}
},
{
"_index": "test_index",
"_type": "child_doc",
"_id": "geUFenxITZSL7epvB568uA",
"_score": 0.75740534,
"_source": {
"content": "text that is special"
},
"highlight": {
"content": [
"text that is <em>special</em>"
]
}
},
{
"_index": "test_index",
"_type": "child_doc",
"_id": "IMHXhM3VRsCLGkshx52uAQ",
"_score": 0.6627297,
"_source": {
"content": "different child text, but special"
},
"highlight": {
"content": [
"different child text, but <em>special</em>"
]
}
}
]
}
}
However, to get back all the docs related to this query, we need to use a bool query:
POST /test_index/parent_doc,child_doc/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "special",
"fields": [
"description",
"content"
]
}
},
{
"has_child": {
"type": "child_doc",
"query": {
"match": {
"content": "special"
}
}
}
},
{
"has_parent": {
"type": "parent_doc",
"query": {
"match": {
"description": "special"
}
}
}
}
]
}
},
"highlight": {
"fields": {
"description": {},
"identifier": {},
"content": {}
}
},
"fields": ["_parent", "_source"]
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0.8866254,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "1",
"_score": 0.8866254,
"_source": {
"identifier": "first",
"description": "some special text"
},
"highlight": {
"description": [
"some <em>special</em> text"
]
}
},
{
"_index": "test_index",
"_type": "child_doc",
"_id": "geUFenxITZSL7epvB568uA",
"_score": 0.67829096,
"_source": {
"content": "text that is special"
},
"fields": {
"_parent": "1"
},
"highlight": {
"content": [
"text that is <em>special</em>"
]
}
},
{
"_index": "test_index",
"_type": "child_doc",
"_id": "IMHXhM3VRsCLGkshx52uAQ",
"_score": 0.18709806,
"_source": {
"content": "different child text, but special"
},
"fields": {
"_parent": "2"
},
"highlight": {
"content": [
"different child text, but <em>special</em>"
]
}
},
{
"_index": "test_index",
"_type": "child_doc",
"_id": "NiwsP2VEQBKjqu1M4AdjCg",
"_score": 0.12531912,
"_source": {
"content": "text that is not"
},
"fields": {
"_parent": "1"
}
},
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "2",
"_score": 0.12531912,
"_source": {
"identifier": "second",
"description": "some different text"
}
}
]
}
}
(I included the "_parent" field to make it easier to see why docs were included in the results, as shown here).
Let me know if this helps.
Here is the code I used:
http://sense.qbox.io/gist/d69a4d6531dc063faa4b4e094cff2a472a73c5a6