How to sort buckets by doc_count? - sorting

GET /civile/_search
{
"size": 0,
"query": {
"match": {
"distretto": "MI"
}
},
"aggs": {
"our_buckets": {
"composite": {
"size": 1000,
"sources": [
{ "codiceoggetto": { "terms": { "field": "codiceoggetto.keyword", "order": "desc" } } }
]
}
}
}
}
My Elasticsearch query match documents by distretto = "MI".
With size = 0 I hide results.
But most important thing is that I define our_buckets aggregation.
It return 1000 keys and it do a "group by" on codiceoggetto.keyword field.
Now I want order my buckets results by doc_count! How can I do?
Here the response
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"our_buckets" : {
"after_key" : {
"codiceoggetto" : "010001"
},
"buckets" : [
{
"key" : {
"codiceoggetto" : "490999"
},
"doc_count" : 3
},
{
"key" : {
"codiceoggetto" : "481312"
},
"doc_count" : 1
},

you can do it using bucket_sort
{
"size": 0,
"query": {
"match": {
"distretto": "MI"
}
},
"aggs": {
"our_buckets": {
"composite": {
"size": 1000,
"sources": [
{
"codiceoggetto": {
"terms": {
"field": "codiceoggetto.keyword",
"order": "desc"
}
}
}
]
},
"aggs": {
"sort_by_count": {
"bucket_sort": {
"sort": [
{
"_count": {
"order": "desc"
}
}
]
}
}
}
}
}
}

Related

Query filter for searching rollup index works with epoch time fails with date math

`How do we query (filter) a rollup index?
For example, based on the query here
Request:
{
"size": 0,
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}
Response:
{
"took" : 93,
"timed_out" : false,
"terminated_early" : false,
"_shards" : ... ,
"hits" : {
"total" : {
"value": 0,
"relation": "eq"
},
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"timeline" : {
"buckets" : [
{
"key_as_string" : "2018-01-18T00:00:00.000Z",
"key" : 1516233600000,
"doc_count" : 6,
"nodes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 5.1499998569488525
}
},
{
"key" : "b",
"doc_count" : 2,
"max_temperature" : {
"value" : 201.0
},
"avg_voltage" : {
"value" : 5.700000047683716
}
},
{
"key" : "c",
"doc_count" : 2,
"max_temperature" : {
"value" : 202.0
},
"avg_voltage" : {
"value" : 4.099999904632568
}
}
]
}
}
]
}
}
}
How to filter say last 3 days, is it possible?
For a test case, I used fixed_interval rate of 1m (one minute, and also 60 minutes) and I tried the following and the error was all query shards failed. Is it possible to query filter rollup agggregations?
Test Query for searching rollup index
{
"size": 0,
"query": {
"range": {
"timestamp": {
"gte": "now-3d/d",
"lt": "now/d"
}
}
}
"aggregations": {
"timeline": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "7d"
},
"aggs": {
"nodes": {
"terms": {
"field": "node"
},
"aggs": {
"max_temperature": {
"max": {
"field": "temperature"
}
},
"avg_voltage": {
"avg": {
"field": "voltage"
}
}
}
}
}
}
}
}

How do I compare two source IP from two different specific log in elastic search

In Elasticsearch I want to compare two logs (natlog and Gateway log) with DSL Query.
In nat log there is srcip1 and In gateway log there is srcip2
I want to if this condition srcip1 === srcip2 satisfied, "agent.id" display in result.
On top of it I will put my already corelated query which I have made
{
"query": {
"bool": {
"should": [
{
"match": {
"location": "\\Users\\Saad\\Desktop\\nat.log"
}
},
{
"match": {
"location": "\\Users\\Saad\\Desktop\\attendance-logs-with-ports.log"
}
}
],
"must": [
{
"term": {
"data.srcip": "1.1.1.1"
}
}
]
}
},
"fields": [
"data.srcip1"
],
"_source": false
}
I tried multiple things but not succeeded.
To display summaries of data you use aggregations. In case you want to compare the different agents depending on the log type for a certain ip the query will be this one:
Ingest data
POST test_saad/_doc
{
"location": "\\Users\\Saad\\Desktop\\nat.log",
"data": {
"srcip1": "1.1.1.1"
},
"agent": {
"id": "agent_1"
}
}
POST test_saad/_doc
{
"location": "\\Users\\Saad\\Desktop\\attendance-logs-with-ports.log",
"data": {
"srcip2": "1.1.1.1"
},
"agent": {
"id": "agent_1"
}
}
POST test_saad/_doc
{
"location": "\\Users\\Saad\\Desktop\\nat.log",
"data": {
"srcip1": "1.1.1.1"
},
"agent": {
"id": "agent_2"
}
}
Request
POST test_saad/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"data.srcip1.keyword": "1.1.1.2"
}
},
{
"term": {
"data.srcip2.keyword": "1.1.1.2"
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"term": {
"location.keyword": """\Users\Saad\Desktop\nat.log"""
}
},
{
"term": {
"location.keyword": """\Users\Saad\Desktop\attendance-logs-with-ports.log"""
}
}
],
"minimum_should_match": 1
}
}
]
}
},
"aggs": {
"log_types": {
"terms": {
"field": "location.keyword",
"size": 10
},
"aggs": {
"agent_types": {
"terms": {
"field": "agent.id.keyword",
"size": 10
}
}
}
}
}
}
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"log_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : """\Users\Saad\Desktop\nat.log""",
"doc_count" : 2,
"agent_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "agent_1",
"doc_count" : 1
},
{
"key" : "agent_2",
"doc_count" : 1
}
]
}
},
{
"key" : """\Users\Saad\Desktop\attendance-logs-with-ports.log""",
"doc_count" : 1,
"agent_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "agent_1",
"doc_count" : 1
}
]
}
}
]
}
}
}

Nested filter in Elasticsearch aggregation query

I am running following aggregation query with nested filter
GET <indexname>/_search
{
"aggs": {
"NAME": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_filter": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "crm",
"query": {
"terms": {
"crm.City.keyword": [
"Rewa"
]
}
}
}
},
{
"nested": {
"path": "crm",
"query": {
"terms": {
"crm.LeadID": [
27961
]
}
}
}
}
]
}
},
"aggs": {
"agg_terms":{
"terms": {
"field": "crm.LeadStatusHistory.StatusID",
"size": 1000
}
}
}
}
}
}
}
}
-----> i have following document
{
"_index" : "crm",
"_type" : "_doc",
"_id" : "4478",
"_score" : 1.0,
"_source" : {
"crm" : [
{
"LeadStatusHistory" : [
{
"StatusID" : 3
},
{
"StatusID" : 2
},
{
"StatusID" : 1
}
],
"LeadID" : 27961,
"City" : "Rewa"
},
{
"LeadStatusHistory" : [
{
"StatusID" : 1
},
{
"StatusID" : 3
},
{
"StatusID" : 2
}
],
"LeadID" : 27959,
"City" : "Rewa"
}
]
}
}]
However in response i am getting following result
"aggregations" : {
"NAME" : {
"doc_count" : 4332,
"agg_filter" : {
"doc_count" : 1,
"agg_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1
}
]
}
}
}
}
Query===> As per source document, i have 3 nested 'crm.LeadStatusHistory' documents for crm.LeadID = 27961. However, results shows for agg_filter equals to 1 instead of 3. Can you please let me know the reason for this case.
Your agg_filter is on the crm.LeadStatusHistory => will target only 1 doc (LeadStatusHistory is one doc, contaning in your case link to others doc).
i build a query who show that, and i thinck will answer to your problem. You will see the different doc_count for each aggregation.
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "crm"
},
"aggs": {
"agg_LeadID": {
"terms": {
"field": "crm.LeadID"
},
"aggs": {
"agg_LeadStatusHistory": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"home_type_name": {
"terms": {
"field": "crm.LeadStatusHistory.StatusID"
}
}
}
}
}
}
}
}
}
}
with this one you can count them, with a script (and filter if needed so):
{
"size": 0,
"aggs": {
"NAME": {
"nested": {
"path": "crm"
},
"aggs": {
"agg_LeadID": {
"terms": {
"field": "crm.LeadID"
},
"aggs": {
"agg_LeadStatusHistory": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_LeadStatusHistory_sum": {
"sum": {
"script": "doc['crm.LeadStatusHistory.StatusID'].values.length"
}
}
}
}
}
}
}
}
}
}
note: if want to get the number of nested documents, take a look to inner_hits:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-inner-hits
I differ with the response that in 'crm.LeadStatusHistory' is one doc. I have run aggregation query on crm.LeadstatusHistory without filters.
GET crm/_search
{
"_source": ["crm.LeadID","crm.LeadStatusHistory.StatusID","crm.City"],
"size": 10000,
"query": {
"nested": {
"path": "crm",
"query": {
"match": {
"crm.LeadID": "27961"
}
}
}
},
"aggs": {
"agg_statuscount": {
"nested": {
"path": "crm.LeadStatusHistory"
},
"aggs": {
"agg_terms":{
"terms": {
"field": "crm.LeadStatusHistory.StatusID",
"size": 1000
}
}
}
}
}
}
I get following response from above query which shows 'agg_statuscount' as 6 docs without filters
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "crm",
"_type" : "_doc",
"_id" : "4478",
"_score" : 1.0,
"_source" : {
"crm" : [
{
"LeadStatusHistory" : [
{
"StatusID" : 3
},
{
"StatusID" : 2
},
{
"StatusID" : 1
}
],
"LeadID" : 27961,
"City" : "Rewa"
},
{
"LeadStatusHistory" : [
{
"StatusID" : 1
},
{
"StatusID" : 3
},
{
"StatusID" : 2
}
],
"LeadID" : 27959,
"City" : "Rewa"
}
]
}
}
]
},
"aggregations" : {
"agg_statuscount" : {
"doc_count" : 6,
"agg_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 2
},
{
"key" : 2,
"doc_count" : 2
},
{
"key" : 3,
"doc_count" : 2
}
]
}
}
}
}
Hence with crm.LeadID = 27961 in aggregation filter, i expected 3 'crm.LeadStatusHistory' docs. Currently the response is 1 as in my original question.

Elastic Search: Selecting multiple vlaues in aggregates

In Elastic Search I have the following index with 'allocated_bytes', 'total_bytes' and other fields:
{
"_index" : "metrics-blockstore_capacity-2017_06",
"_type" : "datapoint",
"_id" : "AVzHwgsi9KuwEU6jCXy5",
"_score" : 1.0,
"_source" : {
"timestamp" : 1498000001000,
"resource_guid" : "2185d15c-5298-44ac-8646-37575490125d",
"allocated_bytes" : 1.159196672E9,
"resource_type" : "machine",
"total_bytes" : 1.460811776E11,
"machine" : "2185d15c-5298-44ac-8646-37575490125d"
}
I have the following query to
1)get a point for 30 minute interval using date-histogram
2)group by field on resource_guid.
3)max aggregate to find the max value.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": 1497992400000,
"lte": 1497996000000
}
}
}
]
}
},
"aggregations": {
"groupByTime": {
"date_histogram": {
"field": "timestamp",
"interval": "30m",
"order": {
"_key": "desc"
}
},
"aggregations": {
"groupByField": {
"terms": {
"size": 1000,
"field": "resource_guid"
},
"aggregations": {
"maxValue": {
"max": {
"field": "allocated_bytes"
}
}
}
},
"sumUnique": {
"sum_bucket": {
"buckets_path": "groupByField>maxValue"
}
}
}
}
}
}
But with this query I am able to get only allocated_bytes, but I need to have both allocated_bytes and total_bytes at the result point.
Following is the result from the above query:
{
"key_as_string" : "2017-06-20T21:00:00.000Z",
"key" : 1497992400000,
"doc_count" : 9,
"groupByField" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "2185d15c-5298-44ac-8646-37575490125d",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156182016E9
}
}, {
"key" : "c3513cdd-58bb-4f8e-9b4c-467230b4f6e2",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156165632E9
}
}, {
"key" : "eff13403-9737-4d08-9dca-fb6c12c3a6fa",
"doc_count" : 3,
"maxValue" : {
"value" : 1.156182016E9
}
} ]
},
"sumUnique" : {
"value" : 3.468529664E9
}
}
I do need both allocated_bytes and total_bytes. How do I get multiple fields( allocated_bytes, total_bytes) for each point?
For example:
"sumUnique" : {
"Allocatedvalue" : 3.468529664E9,
"TotalValue" : 9.468529664E9
}
or like this:
"allocatedBytessumUnique" : {
"value" : 3.468529664E9
}
"totalBytessumUnique" : {
"value" : 9.468529664E9
},
You can just add another aggregation:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": 1497992400000,
"lte": 1497996000000
}
}
}
]
}
},
"aggregations": {
"groupByTime": {
"date_histogram": {
"field": "timestamp",
"interval": "30m",
"order": {
"_key": "desc"
}
},
"aggregations": {
"groupByField": {
"terms": {
"size": 1000,
"field": "resource_guid"
},
"aggregations": {
"maxValueAllocated": {
"max": {
"field": "allocated_bytes"
}
},
"maxValueTotal": {
"max": {
"field": "total_bytes"
}
}
}
},
"sumUniqueAllocatedBytes": {
"sum_bucket": {
"buckets_path": "groupByField>maxValueAllocated"
}
},
"sumUniqueTotalBytes": {
"sum_bucket": {
"buckets_path": "groupByField>maxValueTotal"
}
}
}
}
}
}
I hope you are aware that sum_bucket calculates sibling aggregations only, in this case gives sum of max values, not the sum of total_bytes. If you want to get sum of total_bytes you can use sum aggregation

Why elasticsearch cannot support min_doc_count and order by _count asc?

Requirements:
group by hldId having count(*) = 2
Elasticsearch query:
"aggs": {
"groupByHldId": {
"terms": {
"field": "hldId",
"min_doc_count": 2,
"order" : { "_count" : "asc" }
}
}
}
but no records are return
"aggregations" : {
"groupByHldId" : {
"doc_count_error_upper_bound" : -1,
"sum_other_doc_count" : 2660,
"buckets" : [ ]
}
}
but if changed to desc , it has return
"buckets" : [
{
"key" : 200035075,
"doc_count" : 355
},
or if without min_doc_count, it also has return
"buckets" : [
{
"key" : 200000061,
"doc_count" : 1
},
So why both have mid_doc_count and asc direction it returns empty?
You can try like this, bucket selector with a custom script.
{
"aggs": {
"countfield": {
"terms": {
"field": "hldId",
"size": 100,
"order": {
"_count": "desc"
}
},
"aggs": {
"criticals": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": "params.doc_count==2"
}
}
}
}
}
}

Resources