Aggregation using elastic search - elasticsearch

I have my search query for fetch latest 5000 documents from my elastic DB as below
{
"size": 5000,
"from": 0,
"query": {
"range" : {
"hostTimestamp" : {
"gte" : 1499674634382,
"lte" : 1499680034000
}
}
},
"sort": [
{
"hostTimestamp": {
"order": "desc"
}
}
]
}
Now in the documents that are fetched as result of this query I want to count no of documents with eventSeverity as Alert or Critical. How can this be achieved?

You can achieve that with a terms aggregation on the eventSeverity field:
{
"size": 5000,
"from": 0,
"query": {
"range" : {
"hostTimestamp" : {
"gte" : 1499674634382,
"lte" : 1499680034000
}
}
},
"sort": [
{
"hostTimestamp": {
"order": "desc"
}
}
],
"aggs": { <--- add this part
"severities": {
"terms": {
"field": "eventSeverity"
}
}
}
}

Related

Elastic Search - Pagination on Aggregations

I have an index and I query an aggregation, instead of returning the whole aggregation at once I want to have it returned in chunks, that is small small blocks, is it possible to do so in Elastic Search?
Try to use Bucket sort
POST /sales/_search
{
"size": 0,
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
},
"sales_bucket_sort": {
"bucket_sort": {
"sort": [
{"total_sales": {"order": "desc"}}
],
"size": 3,
"from": 10
}
}
}
}
}
}

How can I aggregate over the _score

I tried to run an aggregate query over the _score field on Elastic Search with no results. Seems it is not possible to use the _score field, maybe because it is not a field of the document. How can I aggregate over the _score ?
This is my query:
{
"_source": false, "explain": false, "from": 0, "size": 0,
"aggs" : {
"score_ranges" : {
"range" : {
"field" : "_score",
"ranges" : [
{ "to" : 50 },
{ "from" : 50, "to" : 75 },
{ "from" : 75 }
]
}
}
},
"query": {
"function_score": {
"query": {
"match_all": { }
}
}
}
}
"aggs": {
"scores_histogram": {
"histogram": {
"script": "return _score.doubleValue() * 10",
"interval": 3
}
}
}
or, with ranges:
"aggs": {
"score_ranges": {
"range": {
"script": "_score",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 75
},
{
"from": 75
}
]
}
}
}
And you need to enable dynamic scripting.

how do I get the latest document grouped by a field?

I have an index with many documents in this format:
{
"userId": 1234,
"locationDate" "2016-07-19T19:24:51+0000",
"location": {
"lat": -47.38163,
"lon": 26.38916
}
}
In this index I have incremental positions from the user, updated every few seconds.
I would like to execute a search that would return me the latest position (sorted by locationDate) from each user (grouped by userId)
Is this possible with elastic search? the best I could do was get all the positions from the last 30 seconds, using this:
{"query":{
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"range" : {
"locationDate" : {
"from" : "2016-07-19T18:54:51+0000",
"to" : null,
"include_lower" : true,
"include_upper" : true
}
}
}
}
}}
And then after that I sort them out by hand, but I would like to do this directly on elastic search
IMPORTANT: I am using elasticsearch 1.5.2
Try this (with aggregations):
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"range": {
"locationDate": {
"from": "2016-07-19T18:54:51+0000",
"to": null,
"include_lower": true,
"include_upper": true
}
}
}
}
},
"aggs": {
"byUser": {
"terms": {
"field": "userId",
"size": 10
},
"aggs": {
"firstOne": {
"top_hits": {
"size": 1,
"sort": [
{
"locationDate": {
"order": "desc"
}
}
]
}
}
}
}
}
}

Count how many documents have an attribute or are missing that attribute in Elasticsearch

How can I write a single Elasticsearch query that will count how many documents either have a value for a field or are missing that field?
This query successfully count the docs missing the field:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Missing_Field" : {
"missing": { "field": "group_doc_groupset_id" }
}
}
}
This query does the opposite, counting documents NOT missing the field:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Not_Missing_Field" : {
"exists": { "field": "group_doc_groupset_id" }
}
}
}
How can I write one that combines both? For example, this yields a syntax error:
POST localhost:9200//<index_name_here>/_search
{
"size": 0,
"aggs" : {
"Missing_Field_Or_Not" : {
"missing": { "field": "group_doc_groupset_id" },
"exists": { "field": "group_doc_groupset_id" }
}
}
}
GET indexname/_search?size=0
{
"aggs": {
"a1": {
"missing": {
"field": "status"
}
},
"a2": {
"filter": {
"exists": {
"field": "status"
}
}
}
}
}
As per new Elastic search recommendation in the docs:
GET {your_index_name}/_search #or _count, to see just the value
{
"query": {
"bool": {
"must_not": { # here can be also "must"
"exists": {
"field": "{field_to_be_searched}"
}
}
}
}
}
Edit: _count allows to have exact values of how many documents are indexed. If there're more than 10k the total is shown as:
"hits" : {
"total" : {
"value" : 10000, # 10k
"relation" : "gte" # Greater than
}

Post filter on subaggregation in elasticsearch

I am trying to run a post filter on the aggregated data, but it is not working as i expected. Can someone review my query and suggest if i am doing anything wrong here.
"query" : {
"bool" : {
"must" : {
"range" : {
"versionDate" : {
"from" : null,
"to" : "2016-04-22T23:13:50.000Z",
"include_lower" : false,
"include_upper" : true
}
}
}
}
},
"aggregations" : {
"associations" : {
"terms" : {
"field" : "association.id",
"size" : 0,
"order" : {
"_term" : "asc"
}
},
"aggregations" : {
"top" : {
"top_hits" : {
"from" : 0,
"size" : 1,
"_source" : {
"includes" : [ ],
"excludes" : [ ]
},
"sort" : [ {
"versionDate" : {
"order" : "desc"
}
} ]
}
},
"disabledDate" : {
"filter" : {
"missing" : {
"field" : "disabledDate"
}
}
}
}
}
}
}
STEPS in the query:
Filter by indexDate less than or equal to a given date.
Aggregate based on formId. Forming buckets per formId.
Sort in descending order and return top hit result per bucket.
Run a subaggregation filter after the sort subaggregation and remove all the documents from buckets where disabled date is not null.(Which is not working)
The whole purpose of post_filter is to run after aggregations have been computed. As such, post_filter has no effect whatsoever on aggregation results.
What you can do in your case is to apply a top-level filter aggregation so that documents with no disabledDate are not taken into account in aggregations, i.e. consider only documents with disabledDate.
{
"query": {
"bool": {
"must": {
"range": {
"versionDate": {
"from": null,
"to": "2016-04-22T23:13:50.000Z",
"include_lower": true,
"include_upper": true
}
}
}
}
},
"aggregations": {
"with_disabled": {
"filter": {
"exists": {
"field": "disabledDate"
}
},
"aggs": {
"form.id": {
"terms": {
"field": "form.id",
"size": 0
},
"aggregations": {
"top": {
"top_hits": {
"size": 1,
"_source": {
"includes": [],
"excludes": []
},
"sort": [
{
"versionDate": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Resources