How to group documents by hours in elastic search aggregation? - elasticsearch

I tried to group my document by hours for a day through aggregation but always get exception "expected field name but got [START_OBJECT]"? What's the problem?
{
"query" : {
"bool" : {
"must" : {
"range" : {
"timestamp" : {
"from" : "2017-08-14 00:00:00",
"to" : "2017-08-15 00:00:00",
"include_lower" : true,
"include_upper" : true
}
}
}
}
},
"aggs": {
"result_by_hours": {
"histogram": {
"script": "doc.timestamp.date.getHourOfDay()",
"interval": 1
}
}
}
}
What I expect is to return the number of documents for each hour on yesterday. How can I use dynamic real time instead of "2017-08-14 - 2017-08-15"?
Thanks in advance:)

Depending on ES version, you can use range filter/query relative to "now", ex now-1d/d will go 1 day back in time.
See examples at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
As for the aggs you can also group by interval of for instance an hour using date_histogram with interval
In ES 5.5:
Query:
"range" : {
"timestamp" : {
"gte" : "now-1d/d,
"lte" : "now/d"
}
}
Aggs:
{
"aggs" : {
"values_over_time" : {
"date_histogram" : {
"field" : "timestamp",
"interval" : "1h"
}
}
}
}

Related

ElasticSearch: Sort Aggregations by Filtered Average

I have an ElasticSearch index with documents structured like this:
"created": "2019-07-31T22:44:41.437Z",
"id": "2956",
"rating": 1
If I wish to create an aggregation of the id fields which is sorted on the average of the rating, that could be handled by:
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
However, I'm looking to only factor in documents which have a created value that was within the last week (and then take the average of those rating fields).
My naive thoughts on this would be to apply a filter or range within the sort aggregation, but an aggregation cannot have multiple types, and looking through the avg documentation, I don't see a means to put it in the avg. Optimistically attempting to put range fields in the avg regardless of what the documentation says yielded no results (as expected).
How would I go about achieving this?
Try adding a bool query to the body with a range query:
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": one_week_ago,
}
}
}
}
},
{
"aggs" : {
"sorted" : {
"terms" : {
"field" : "id",
"order" : { "sort" : "asc" }
},
"aggs" : {
"sort" : {
"avg" : {
"field" : "rating"
}
}
}
}
}
}
and you can query for dynamic dates like this
as Tom referred but use "now-7d/d"
{
query:
bool: {
must: {
"range": {
"created_time": {
"gte": "now-7d/d"
}
}
}
}
}

Elasticsearch - EXISTS syntax + Filter not working

I am trying to query for a date range where a particular field exists. This seems like it would be easy but I am sensing that the keyword "exists" has changed per the documentation. I am on 5.4. https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-exists-filter.html
I use #timestamp for dates and the field "error_data" is in the mapping and only appears if an error condition is found.
Here is my query....
GET /filebeat-2017.07.25/_search
{
"query": {
"bool" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
},
"exists": {
"field": "error_data"
}
}
}
}
but it says that "[bool] query does not support [exists]" whereas the following does not work either but gets an parsing error message of "[exists] malformed query, expected [END_OBJECT] but found [FIELD_NAME]" on line 6 column 9. Thanks for your help.
GET /filebeat-2017.07.25/_search
{
"query": {
"exists": {
"field": "error_data"
},
"bool" : {
"filter" : {
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
}
}
}
}
You're almost there. Try like this:
GET /filebeat-2017.07.25/_search
{
"query": {
"bool" : {
"filter" : [
{
"range" : {
"#timestamp" : {
"gte" : "now-5m",
"lte" : "now-1m"
}
}
},
{
"exists": {
"field": "error_data"
}
}
]
}
}
}
i.e. the bool/filter clause must be an array if you have several clauses to put in it:

Elasticsearch - Remove double results in search

I don't know how to remove double results with the same value in one field.
My Searchquery:
query :{
range : {
"endtime" : {
"lt" : "2017-02-09T20:00:00",
"gt" : "2017-02-09T01:00:00"
}
}
}
In my results there's one field called "link" which has often the same value (f.ex. https://www.facebook.com).
I would prefer a solution for my query, that would be great.
Thanks.
Greetings!
You can do a terms aggregation.
GET /cars/transactions/_search?search_type=count
{
"query": {
"range" : {
"endtime" : {
"gte" : "2017-02-09T20:00:00",
"lt" : "2017-02-09T01:00:00"
}
}
},
"aggs": {
"distinct_links": {
"terms": {
"field": "links",
"size": 100
}
}
}
}
something like this.

Range query in elasticsearch does not work properly

I have an index that contains objects eventvalue-eventtime. I want to write a query that will return aggregated event count based on eventvalue for the last 30 seconds. Also, I need empty buckets if for a given seconds there was no events - I need to display this data on a graph.
So I wrote the following query:
{
"query" : {
"bool" : {
"must" : [
{
"range" : {
"eventtime" : {
"gte" : "now-30s/s",
"lte" : "now/s",
"format" : "yyyy-MM-dd HH:mm:ss",
"time_zone": "+03:00"
}
}
},
{
"range" : {
"eventvalue" : {
"lte" : 3
}
}
}
]
}
},
"aggs": {
"values_agg": {
"terms": {
"field": "eventvalue",
"min_doc_count" : 0,
"order": {
"_term": "asc"
}
},
"aggs": {
"events_over_time" : {
"date_histogram" : {
"field" : "eventtime",
"interval" : "1s",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "now-30s/s",
"max" : "now/s"
},
"format" : "yyyy-MM-dd HH:mm:ss",
"time_zone": "+03:00"
}
}
}
}
}
}
This query is not working properly and I don't know why. Specifically, the first "range" query gives me desired interval (if I remove it I'm getting values from all time). But the second "range" query seems to have no effect. Eventvalue can be anywhere from 1 to 10 and the desired effect is that I will have three buckets for eventvalues 1-3. However, I get all 10 buckets with all events.
How can I fix this query so it still returns empty buckets but only for selected evenvalues?
I believe you need to remove the "min_doc_count": 0 from your terms aggregation. To achieve the empty buckets you're aiming for, you need only use min_doc_count in the date_histogram aggregation.
Per the documentation for the terms aggregation:
Setting min_doc_count=0 will also return buckets for terms that didn’t
match any hit.
This explains why you are seeing buckets for eventvalues that are greater than 3. They were filtered out by the query, but brought back in by the terms aggregation.
UPDATE
Since there is a possibility that the eventvalues may not exist anywhere in the 30sec time slice, the other approach I would recommend is to manually specify the discrete values you want to use as buckets using a filters aggregation. See the documentation here.
Try using this for your aggregations:
"aggs": {
"values_agg": {
"filters": {
"filters": {
"1": { "term": { "eventvalue": 1 }},
"2": { "term": { "eventvalue": 2 }},
"3": { "term": { "eventvalue": 3 }}
}
},
"aggs": {
"events_over_time" : {
"date_histogram" : {
"field" : "eventtime",
"interval" : "1s",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "now-30s/s",
"max" : "now/s"
},
"format" : "yyyy-MM-dd HH:mm:ss",
"time_zone": "+03:00"
}
}
}
}
}

Query muilt filed by date and ip in elasticesarch

in elasticsearch data load from next josn data.
,i want get the max value of cpu0 and in_eth1 for every ip in elasticsearch and sorted by date , so some one can help me with the flowing query?
{
"ip":"10.235.13.172",
"date":"2015-11-09",
"time":"18:30:00",
"cpu0":7"cpu13":2,
"cpu14":1,
"diskio(%)":0,
"memuse(MB)":824,
"in_eth1(Mbps)":34
}
"aggs": {
"events_by_date": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs" : {
"genders" : {
"terms" : {
"field" : "ip",
"size": 100000,
"order" : { "_count" : "asc" }
},
"aggs" : {
"maxcpu" : { "max" : { "field" : "cpu(%)" } },
"maxin" : { "max" : { "field" : "in_eth1(Mbps)" } },
}
}
}
}
}

Resources