Nested filter in Elasticsearch aggregation query - elasticsearch

I am running following aggregation query with nested filter
GET <indexname>/_search
{
"aggs": {
"NAME": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_filter": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "crm",
"query": {
"terms": {
"crm.City.keyword": [
"Rewa"
]
}
}
}
},
{
"nested": {
"path": "crm",
"query": {
"terms": {
"crm.LeadID": [
27961
]
}
}
}
}
]
}
},
"aggs": {
"agg_terms":{
"terms": {
"field": "crm.LeadStatusHistory.StatusID",
"size": 1000
}
}
}
}
}
}
}
}
-----> i have following document
{
"_index" : "crm",
"_type" : "_doc",
"_id" : "4478",
"_score" : 1.0,
"_source" : {
"crm" : [
{
"LeadStatusHistory" : [
{
"StatusID" : 3
},
{
"StatusID" : 2
},
{
"StatusID" : 1
}
],
"LeadID" : 27961,
"City" : "Rewa"
},
{
"LeadStatusHistory" : [
{
"StatusID" : 1
},
{
"StatusID" : 3
},
{
"StatusID" : 2
}
],
"LeadID" : 27959,
"City" : "Rewa"
}
]
}
}]
However in response i am getting following result
"aggregations" : {
"NAME" : {
"doc_count" : 4332,
"agg_filter" : {
"doc_count" : 1,
"agg_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1
}
]
}
}
}
}
Query===> As per source document, i have 3 nested 'crm.LeadStatusHistory' documents for crm.LeadID = 27961. However, results shows for agg_filter equals to 1 instead of 3. Can you please let me know the reason for this case.

Your agg_filter is on the crm.LeadStatusHistory => will target only 1 doc (LeadStatusHistory is one doc, contaning in your case link to others doc).
i build a query who show that, and i thinck will answer to your problem. You will see the different doc_count for each aggregation.
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "crm"
},
"aggs": {
"agg_LeadID": {
"terms": {
"field": "crm.LeadID"
},
"aggs": {
"agg_LeadStatusHistory": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"home_type_name": {
"terms": {
"field": "crm.LeadStatusHistory.StatusID"
}
}
}
}
}
}
}
}
}
}
with this one you can count them, with a script (and filter if needed so):
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "crm"
},
"aggs": {
"agg_LeadID": {
"terms": {
"field": "crm.LeadID"
},
"aggs": {
"agg_LeadStatusHistory": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_LeadStatusHistory_sum": {
"sum": {
"script": "doc['crm.LeadStatusHistory.StatusID'].values.length"
}
}
}
}
}
}
}
}
}
}
note: if want to get the number of nested documents, take a look to inner_hits:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-inner-hits

I differ with the response that in 'crm.LeadStatusHistory' is one doc. I have run aggregation query on crm.LeadstatusHistory without filters.
GET crm/_search
{
"_source": ["crm.LeadID","crm.LeadStatusHistory.StatusID","crm.City"],
"size": 10000,
"query": {
"nested": {
"path": "crm",
"query": {
"match": {
"crm.LeadID": "27961"
}
}
}
},
"aggs": {
"agg_statuscount": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_terms":{
"terms": {
"field": "crm.LeadStatusHistory.StatusID",
"size": 1000
}
}
}
}
}
}
I get following response from above query which shows 'agg_statuscount' as 6 docs without filters
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "crm",
"_type" : "_doc",
"_id" : "4478",
"_score" : 1.0,
"_source" : {
"crm" : [
{
"LeadStatusHistory" : [
{
"StatusID" : 3
},
{
"StatusID" : 2
},
{
"StatusID" : 1
}
],
"LeadID" : 27961,
"City" : "Rewa"
},
{
"LeadStatusHistory" : [
{
"StatusID" : 1
},
{
"StatusID" : 3
},
{
"StatusID" : 2
}
],
"LeadID" : 27959,
"City" : "Rewa"
}
]
}
}
]
},
"aggregations" : {
"agg_statuscount" : {
"doc_count" : 6,
"agg_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 2
},
{
"key" : 2,
"doc_count" : 2
},
{
"key" : 3,
"doc_count" : 2
}
]
}
}
}
}
Hence with crm.LeadID = 27961 in aggregation filter, i expected 3 'crm.LeadStatusHistory' docs. Currently the response is 1 as in my original question.

Related

Query filter for searching rollup index works with epoch time fails with date math

`How do we query (filter) a rollup index?
For example, based on the query here
Request:
{
"size": 0,
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}
Response:
{
"took" : 93,
"timed_out" : false,
"terminated_early" : false,
"_shards" : ... ,
"hits" : {
"total" : {
"value": 0,
"relation": "eq"
},
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"timeline" : {
"buckets" : [
{
"key_as_string" : "2018-01-18T00:00:00.000Z",
"key" : 1516233600000,
"doc_count" : 6,
"nodes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 5.1499998569488525
}
},
{
"key" : "b",
"doc_count" : 2,
"max_temperature" : {
"value" : 201.0
},
"avg_voltage" : {
"value" : 5.700000047683716
}
},
{
"key" : "c",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 4.099999904632568
}
}
]
}
}
]
}
}
}
How to filter say last 3 days, is it possible?
For a test case, I used fixed_interval rate of 1m (one minute, and also 60 minutes) and I tried the following and the error was all query shards failed. Is it possible to query filter rollup agggregations?
Test Query for searching rollup index
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "now-3d/d",
"lt": "now/d"
}
}
}
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}

Elasticsearch sub agregation

With the following query, I get the minimum value in each chunk of 15 minutes. I use the moving_fn function. Now I need to get the maximum value in each chunk in 1 hour from the previous request. As I understand it cannot be used for aggregation after moving_fn. How can you do this?
This is my query:
GET logstash-2021.12.2*/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"#timestamp": {
"gte": "now-24h"
}
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"company": "BLAH-BLAH"
}
}
]
}
}
]
}
},
"size": 0,
"aggs": {
"myDatehistogram": {
"date_histogram": {
"field": "#timestamp",
"interval": "1m",
"offset": "+30s"
}, "aggs": {
"the_count": {
"moving_fn": {
"buckets_path": "_count",
"window": 15,
"script": "MovingFunctions.min(values)"
}
}
}
}
}
}
My response:
"aggregations" : {
"myDatehistogram" : {
"buckets" : [
{
"key_as_string" : "2021-12-25T05:58:30.000Z",
"key" : 1640411910000,
"doc_count" : 1196,
"the_count" : {
"value" : null
}
},
{
"key_as_string" : "2021-12-25T05:59:30.000Z",
"key" : 1640411970000,
"doc_count" : 1942,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:00:30.000Z",
"key" : 1640412030000,
"doc_count" : 1802,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:01:30.000Z",
"key" : 1640412090000,
"doc_count" : 1735,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:02:30.000Z",
"key" : 1640412150000,
"doc_count" : 1699,
"the_count" : {
"value" : 1196.0
}
},
{
"key_as_string" : "2021-12-25T06:03:30.000Z",
"key" : 1640412210000,
"doc_count" : 1506,
"the_count" : {
"value" : 1196.0
}
}
From this response, I need to get the maximum value for each hour. Thank you in advance
Just add a second agg:
"myDatehistogram": {
"date_histogram": {
"field": "#timestamp",
"interval": "1m",
"offset": "+30s"
}, "aggs": {
"min_15": {
"moving_fn": {
"buckets_path": "_count",
"window": 15,
"script": "MovingFunctions.min(values)"
}
}
"max_60": {
"moving_fn": {
"buckets_path": "_count",
"window": 60,
"script": "MovingFunctions.max(values)"
}
}
}
}

How do I compare two source IP from two different specific log in elastic search

In Elasticsearch I want to compare two logs (natlog and Gateway log) with DSL Query.
In nat log there is srcip1 and In gateway log there is srcip2
I want to if this condition srcip1 === srcip2 satisfied, "agent.id" display in result.
On top of it I will put my already corelated query which I have made
{
"query": {
"bool": {
"should": [
{
"match": {
"location": "\\Users\\Saad\\Desktop\\nat.log"
}
},
{
"match": {
"location": "\\Users\\Saad\\Desktop\\attendance-logs-with-ports.log"
}
}
],
"must": [
{
"term": {
"data.srcip": "1.1.1.1"
}
}
]
}
},
"fields": [
"data.srcip1"
],
"_source": false
}
I tried multiple things but not succeeded.
To display summaries of data you use aggregations. In case you want to compare the different agents depending on the log type for a certain ip the query will be this one:
Ingest data
POST test_saad/_doc
{
"location": "\\Users\\Saad\\Desktop\\nat.log",
"data": {
"srcip1": "1.1.1.1"
},
"agent": {
"id": "agent_1"
}
}
POST test_saad/_doc
{
"location": "\\Users\\Saad\\Desktop\\attendance-logs-with-ports.log",
"data": {
"srcip2": "1.1.1.1"
},
"agent": {
"id": "agent_1"
}
}
POST test_saad/_doc
{
"location": "\\Users\\Saad\\Desktop\\nat.log",
"data": {
"srcip1": "1.1.1.1"
},
"agent": {
"id": "agent_2"
}
}
Request
POST test_saad/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"data.srcip1.keyword": "1.1.1.2"
}
},
{
"term": {
"data.srcip2.keyword": "1.1.1.2"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"term": {
"location.keyword": """\Users\Saad\Desktop\nat.log"""
}
},
{
"term": {
"location.keyword": """\Users\Saad\Desktop\attendance-logs-with-ports.log"""
}
}
],
"minimum_should_match": 1
}
}
]
}
},
"aggs": {
"log_types": {
"terms": {
"field": "location.keyword",
"size": 10
},
"aggs": {
"agent_types": {
"terms": {
"field": "agent.id.keyword",
"size": 10
}
}
}
}
}
}
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"log_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : """\Users\Saad\Desktop\nat.log""",
"doc_count" : 2,
"agent_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "agent_1",
"doc_count" : 1
},
{
"key" : "agent_2",
"doc_count" : 1
}
]
}
},
{
"key" : """\Users\Saad\Desktop\attendance-logs-with-ports.log""",
"doc_count" : 1,
"agent_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "agent_1",
"doc_count" : 1
}
]
}
}
]
}
}
}

Elastic-search missing bucket aggregation

Updated
I have the following elastic-search query. Which gives me the following results, with aggregation.
Tried following what Andrey Borisko example but for the life of me i can not get it working.
The main query with filter of companyId finds all the fullnames with the name 'Brenda'
The companyId agg returns best match companyId for fullnames brenda, based of the main filter.
My exact query
GET employee-index/_search
{
"aggs": {
"companyId": {
"terms": {
"field": "companyId"
},
"aggs": {
"filtered": {
"filter": {
"multi_match": {
"fields": [
"fullName.edgengram",
"number"
],
"query": "brenda"
}
}
}
}
}
},
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": [
"fullName.edgengram",
"number"
],
"query": "brenda"
}
}
],
"filter": [
{
"terms": {
"companyId": [
3849,
3867,
3884,
3944,
3260,
4187,
3844,
2367,
158,
3176,
3165,
3836,
4050,
3280,
2298,
3755,
3854,
7161,
3375,
7596,
836,
4616
]
}
}
]
}
}
}
My exact result
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 8.262566,
"hits" : [
{
"_index" : "employee-index",
"_type" : "_doc",
"_id" : "67207",
"_score" : 8.262566,
"_source" : {
"companyGroupId" : 1595,
"companyId" : 158,
"fullName" : "Brenda Grey",
"companyTradingName" : "Sky Blue",
}
},
{
"_index" : "employee-index",
"_type" : "_doc",
"_id" : "7061",
"_score" : 7.868355,
"_source" : {
"companyGroupId" : 1595,
"companyId" : 158,
"fullName" : "Brenda Eaton",
"companyTradingName" : "Sky Blue",
}
},
{
"_index" : "employee-index",
"_type" : "_doc",
"_id" : "107223",
"_score" : 7.5100465,
"_source" : {
"companyGroupId" : 1595,
"companyId" : 3260,
"fullName" : "Brenda Bently",
"companyTradingName" : "Green Ice",
}
}
]
},
"aggregations" : {
"companyId" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "158",
"doc_count" : 2,
"filtered" : {
"doc_count" : 2
}
},
{
"key" : "3260",
"doc_count" : 1,
"filtered" : {
"doc_count" : 1
}
}
]
}
}
}
**This is how i want the filtered-companies results to look**
"aggregations": {
"companyId": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "158",
"doc_count": 2,
"filtered": {
"doc_count": 2 (<- 2 records found of brenda)
}
},
{
"key": "3260",
"doc_count": 1,
"filtered": {
"doc_count": 1 (<- 1 records found of brenda)
}
},
{
"key": "4616",
"doc_count": 0,
"filtered": {
"doc_count": 0 (<- 0 records found of brenda)
}
},
... and so on. Basically all the other companies that are in the filtered list i want to display with a doc_count of 0.
]
}
As I understood you correctly, you want to run aggregation or a part of aggregation independently from the query. In this case you should use Global Aggregation
UPDATE after your comment
In this case you need to use filter aggregation. So for example this type query (simplified your example) you have currently:
GET indexName/_search
{
"size": 0,
"query": {
"match": {
"firstName": "John"
}
},
"aggs": {
"by_phone": {
"terms": {
"field": "cellPhoneNumber"
}
}
}
}
becomes this:
GET indexName/_search
{
"size": 0,
"aggs": {
"by_phone": {
"terms": {
"field": "cellPhoneNumber"
},
"aggs": {
"filtered": {
"filter": {
"match": {
"firstName": "John"
}
}
}
}
}
}
}
the output will look slightly different though:
...
"aggregations" : {
"by_phone" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 260072,
"buckets" : [
{
"key" : "+9649400",
"doc_count" : 270,
"filtered" : {
"doc_count" : 0 <-- not John
}
},
{
"key" : "+8003000",
"doc_count" : 184,
"filtered" : {
"doc_count" : 3 <-- this is John
}
},
{
"key" : "+41025026",
"doc_count" : 168,
"filtered" : {
"doc_count" : 0 <-- not John
}
}
...
And now if you need the results of query as well then you have to wrap it in global aggregation like so:
GET indexName/_search
{
"size": 20,
"from": 0,
"query": {
"match": {
"firstName": "John"
}
},
"aggs": {
"all": {
"global": {},
"aggs": {
"by_phone": {
"terms": {
"field": "cellPhoneNumber"
},
"aggs": {
"filtered": {
"filter": {
"match": {
"firstName": "John"
}
}
}
}
}
}
}
}
}
Reviewed version based on your query:
GET employee-index/_search
{
"size": 0,
"aggs": {
"filtered": {
"filter": {
"bool": {
"filter": [
{
"terms": {
"companyId": [
3849,
3867,
3884,
3944,
3260,
4187,
3844,
2367,
158,
3176,
3165,
3836,
4050,
3280,
2298,
3755,
3854,
7161,
3375,
7596,
836,
4616
]
}
}
]
}
},
"aggs": {
"by_companyId": {
"terms": {
"field": "companyId",
"size": 1000
},
"aggs": {
"testing": {
"filter": {
"multi_match": {
"fields": [
"fullName"
],
"query": "brenda"
}
}
}
}
}
}
}
}
}

Elastic Search: Selecting multiple vlaues in aggregates

In Elastic Search I have the following index with 'allocated_bytes', 'total_bytes' and other fields:
{
"_index" : "metrics-blockstore_capacity-2017_06",
"_type" : "datapoint",
"_id" : "AVzHwgsi9KuwEU6jCXy5",
"_score" : 1.0,
"_source" : {
"timestamp" : 1498000001000,
"resource_guid" : "2185d15c-5298-44ac-8646-37575490125d",
"allocated_bytes" : 1.159196672E9,
"resource_type" : "machine",
"total_bytes" : 1.460811776E11,
"machine" : "2185d15c-5298-44ac-8646-37575490125d"
}
I have the following query to
1)get a point for 30 minute interval using date-histogram
2)group by field on resource_guid.
3)max aggregate to find the max value.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": 1497992400000,
"lte": 1497996000000
}
}
}
]
}
},
"aggregations": {
"groupByTime": {
"date_histogram": {
"field": "timestamp",
"interval": "30m",
"order": {
"_key": "desc"
}
},
"aggregations": {
"groupByField": {
"terms": {
"size": 1000,
"field": "resource_guid"
},
"aggregations": {
"maxValue": {
"max": {
"field": "allocated_bytes"
}
}
}
},
"sumUnique": {
"sum_bucket": {
"buckets_path": "groupByField>maxValue"
}
}
}
}
}
}
But with this query I am able to get only allocated_bytes, but I need to have both allocated_bytes and total_bytes at the result point.
Following is the result from the above query:
{
"key_as_string" : "2017-06-20T21:00:00.000Z",
"key" : 1497992400000,
"doc_count" : 9,
"groupByField" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "2185d15c-5298-44ac-8646-37575490125d",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156182016E9
}
}, {
"key" : "c3513cdd-58bb-4f8e-9b4c-467230b4f6e2",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156165632E9
}
}, {
"key" : "eff13403-9737-4d08-9dca-fb6c12c3a6fa",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156182016E9
}
} ]
},
"sumUnique" : {
"value" : 3.468529664E9
}
}
I do need both allocated_bytes and total_bytes. How do I get multiple fields( allocated_bytes, total_bytes) for each point?
For example:
"sumUnique" : {
"Allocatedvalue" : 3.468529664E9,
"TotalValue" : 9.468529664E9
}
or like this:
"allocatedBytessumUnique" : {
"value" : 3.468529664E9
}
"totalBytessumUnique" : {
"value" : 9.468529664E9
},
You can just add another aggregation:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": 1497992400000,
"lte": 1497996000000
}
}
}
]
}
},
"aggregations": {
"groupByTime": {
"date_histogram": {
"field": "timestamp",
"interval": "30m",
"order": {
"_key": "desc"
}
},
"aggregations": {
"groupByField": {
"terms": {
"size": 1000,
"field": "resource_guid"
},
"aggregations": {
"maxValueAllocated": {
"max": {
"field": "allocated_bytes"
}
},
"maxValueTotal": {
"max": {
"field": "total_bytes"
}
}
}
},
"sumUniqueAllocatedBytes": {
"sum_bucket": {
"buckets_path": "groupByField>maxValueAllocated"
}
},
"sumUniqueTotalBytes": {
"sum_bucket": {
"buckets_path": "groupByField>maxValueTotal"
}
}
}
}
}
}
I hope you are aware that sum_bucket calculates sibling aggregations only, in this case gives sum of max values, not the sum of total_bytes. If you want to get sum of total_bytes you can use sum aggregation

Resources