I actually want to aggregate all the values of the a field in my index whose length is greater than 6 in some date range.
I could fetch all values of the field, grouped by that keyword. Now, i want to add the condition to check if the keyword length is more than 6 or not.
Here is the query, till where I could come up with.
"size": 0,
"aggs": {
"range":{
"date_range": {
"field": "timestamp",
"ranges": [
{
"from": "now-1d/d",
"to": "now"
}
]
},
"aggs": {
"group_by_name":{
"terms": {
"field": "name.keyword",
"size": 100
}
}
}
}
}
}
You can did using simple painless script. check out the docs aggregations
{
"size": 0,
"aggs": {
"range": {
"date_range": {
"field": "timestamp",
"ranges": [
{
"from": "now-1d/d",
"to": "now"
}
]
},
"aggs": {
"group_by_name": {
"terms": {
"script": {
"source": """
if (doc['name.keyword'].value.toString().length() > 6) {
return doc['name.keyword'].value;
}
""",
"lang": "painless"
},
"size": 100
}
}
}
}
}
}
Related
Hi I have the below elastic search query using this in dev tools. I keep getting errors for my bool query but it seems correct looking at #timestamp field and trying to only retrieve one day worth of data.
"input": {
"search": {
"request": {
"indices": [
"<iovation-*>"
],
"body": {
"size": 0,
"query": {
"bool": {
"must": {
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
},
"aggs": {
"percentiles": {
"percentiles": {
"field": "logstash.load.duration",
"percents": 95,
"keyed": false
}
},
"dates": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "5m",
"min_doc_count": 1
}
}
}
}
}
}
}
},
Any help is appreciated thanks!
There are few errors in your query
Whenever aggregation is used along with the query part, then the structure is
{
"query": {},
"aggs": {}
}
You are missing one } at the end of the query part
Calendar Intervals do not accept multiple quantities like 2d, 2m, etc.
If you have a fixed interval, then you can refer to the fixed_interval param
Modify your query as
{
"size": 0,
"query": {
"bool": {
"must": {
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
} // note this
},
"aggs": {
"percentiles": {
"percentiles": {
"field": "logstash.load.duration",
"percents": 95,
"keyed": false
}
},
"dates": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "5m", // note this
"min_doc_count": 1
}
}
}
}
Here is my query result
GET _search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"serviceName.keyword": "directory-view-service"
}
},
{
"match": {
"path": "thewall"
}
},
{
"range": {
"#timestamp": {
"from": "now-31d",
"to": "now"
}
}
}
]
}
},
"aggs": {
"by_day": {
"date_histogram": {
"field": "date",
"interval": "7d"
},
"aggs": {
"byUserUid": {
"terms": {
"field": "token_userId.keyword",
"size": 150000
},
"aggs": {
"filterByCallNumber": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": {
"inline": "params.doc_count <= 1"
}
}
}
}
}
}
}
}
}
I want my query return all user call my endpoint min. once time by 1 month range by 7 days interval, until then everything is good.
But my result is a buckets with 370 elements and I just need to know the array size...
Are there any keyword or how can I handle it ?
Thanks
I have mimicked how kibana does a query search and have come up with the below query. Basically I'm looking for the lat 6 days of data (including those days where there is no data, since I need to feed it to a graph). But the returned buckets is giving me more than just those days. I woul like to understand where I'm going wring with this.
{
"version": true,
"size": 0,
"sort": [
{
"#timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"target_traffic": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d",
"time_zone": "Asia/Kolkata",
"min_doc_count": 0,
"extended_bounds": {
"min": "now-6d/d",
"max": "now"
}
},
"aggs": {
"days_filter": {
"filter": {
"range": {
"#timestamp": {
"gt": "now-6d",
"lte": "now"
}
}
},
"aggs": {
"in_bytes": {
"sum": {
"field": "netflow.in_bytes"
}
},
"out_bytes": {
"sum": {
"field": "netflow.out_bytes"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"#timestamp",
"netflow.first_switched",
"netflow.last_switched"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "( flow.src_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) OR flow.dst_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) ) AND flow.traffic_locality: \"private\"",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}
If you put the range filter inside your aggregation section without any date range in your query, what is going to happen is that your aggregations will run on all your data and metrics will be bucketed by day over all your data.
The range query on #timestamp should be moved inside the query section so as to compute aggregations only on the data you want, i.e. the last 6 days.
I'm grouping by offerId, the each bucket has two buckets: price <=0 and price > 0. I need to make sure that price <= 0 includes documents where price field is missing:
{
"size": 0,
"aggs": {
"by_offer_id": {
"terms": {
"field": "offerId"
},
"aggs": {
"by_price": {
"range": {
"field": "price",
"ranges": [
{
"to": 0
},
{
"from": 0
}
]
},
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
}
}
}
}
I've tried adding "missing": 0 after "field": "price",, but it throws SearchPhaseExecutionException.
I'm using 1.7.5, but potentially could use syntax from 2.4.x.
In this particular case I don't event need to set "missing" : 0,
{
"size": 0,
"aggs": {
"by_offer_id": {
"terms": {
"field": "offerId"
},
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
}
}
because term aggregation returns total document count, white stats aggregation only includes documents with existing price and returns the total number. I can deduce how many document don't have a price field by subtraction.
I Think you should use script just like this:
{
"size": 0,
"aggs": {
"by_offer_id": {
"terms": {
"field": "offerId"
},
"aggs": {
"by_price": {
"range": {
"script": {
"lang": "painless",
"source": "doc['price'].value ==null ? 0 : doc['price'].value"
},
"ranges": [
{
"to": 0
},
{
"from": 0
}
]
},
"aggs": {
"price_stats": {
"stats": {
"field": "price"
}
}
}
}
}
}
}
}
or
"source": "doc['price'].value * 1"
{
"query": {
"filtered": {
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"filter": {
"numeric_range": {
"time_taken": {
"gte": 10
}
}
}
}
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
I have to run this query 20 times as i want to know notification times above each of the following thresholds- [10,30,60,120,240,300,600,1200..]. Right now, i am running a loop and making 20 queries for fetching this.
Is there a more sane way to query elasticsearch once and get ranges that fall into these thresholds respectively?
What you probably want is a "range aggregation".
Here is the possible query where you can add more range or alter them -
{
"size": 0,
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"aggs": {
"intervals": {
"range": {
"field": "time_taken",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100
}
]
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}