malformed bool query elasticsearch - Elasticsearch watcher - elasticsearch

Hi I have the below elastic search query using this in dev tools. I keep getting errors for my bool query but it seems correct looking at #timestamp field and trying to only retrieve one day worth of data.
"input": {
"search": {
"request": {
"indices": [
"<iovation-*>"
],
"body": {
"size": 0,
"query": {
"bool": {
"must": {
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
},
"aggs": {
"percentiles": {
"percentiles": {
"field": "logstash.load.duration",
"percents": 95,
"keyed": false
}
},
"dates": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "5m",
"min_doc_count": 1
}
}
}
}
}
}
}
},
Any help is appreciated thanks!

There are few errors in your query
Whenever aggregation is used along with the query part, then the structure is
{
"query": {},
"aggs": {}
}
You are missing one } at the end of the query part
Calendar Intervals do not accept multiple quantities like 2d, 2m, etc.
If you have a fixed interval, then you can refer to the fixed_interval param
Modify your query as
{
"size": 0,
"query": {
"bool": {
"must": {
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
} // note this
},
"aggs": {
"percentiles": {
"percentiles": {
"field": "logstash.load.duration",
"percents": 95,
"keyed": false
}
},
"dates": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "5m", // note this
"min_doc_count": 1
}
}
}
}

Related

ElasticSearch: Nested buckets aggregation

I'm new to ElasticSearch, so this question could be quite trivial for you, but here I go:
I'm using kibana_sample_data_ecommerce, which documents have a mapping like this
{
...
"order_date" : <datetime>
"taxful_total_price" : <double>
...
}
I want to get a basic daily behavior of the data:
Expecting documents like this:
[
{
"qtime" : "00:00",
"mean" : 20,
"std" : 40
},
{
"qtime" : "01:00",
"mean" : 150,
"std" : 64
},
...
]
So, the process I think that I need to do is:
Group by day all records ->
Group by time window for each day ->
Sum all record in each time window ->
Cumulative Sum for each sum by time window, thus, I get behavior of a day ->
Extended_stats by the same time window across all days
And that can be expressed like this:
But I can't unwrap those buckets to process those statistics. May you give me some advice to do that operation and get that result?
Here is my current query(kibana developer tools):
POST kibana_sample_data_ecommerce/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"order_date": {
"gt": "now-1M",
"lte": "now"
}
}
}
]
}
},
"aggs": {
"day_histo": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"qmin_histo": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "hour"
},
"aggs": {
"qminute_sum": {
"sum": {
"field": "taxful_total_price"
}
},
"cumulative_qminute_sum": {
"cumulative_sum": {
"buckets_path": "qminute_sum"
}
}
}
}
}
}
}
}
Here's how you pull off the extended stats:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"order_date": {
"gt": "now-4M",
"lte": "now"
}
}
}
]
}
},
"aggs": {
"by_day": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"by_hour": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "hour"
},
"aggs": {
"by_taxful_total_price": {
"extended_stats": {
"field": "taxful_total_price"
}
}
}
}
}
}
}
}
yielding

ElasticSearch query with prefix for aggregation

I am trying to add a prefix condition for my ES query in a "must" clause.
My current query looks something like this:
body = {
"query": {
"bool": {
"must":
{ "term": { "article_lang": 0 }}
,
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I need to add a mandatory condition to my query to filter articles whose id starts with "article-".
So, far I have tried this:
{
"query": {
"bool": {
"should": [
{ "term": { "article_lang": 0 }},
{ "prefix": { "article_id": {"value": "article-"} }}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I am fairly new to ES and from the documentations online, I know that "should" is to be used for "OR" conditions and "must" for "AND". This is returning me some data but as per the condition it will be consisting of either article_lang=0 or articles starting with article-. When I use "must", it doesn't return anything.
I am certain that there are articles with id starting with this prefix because currently, we are iterating through this result to filter out such articles. What am I missing here?
In your prefix query, you need to use the article_id.keyword field, not article_id. Also, you should prefer filter over must since you're simply doing yes/no matching (aka filters)
{
"query": {
"bool": {
"filter": [ <-- change this
{
"term": {
"article_lang": 0
}
},
{
"prefix": {
"article_id.keyword": { <-- and this
"value": "article-"
}
}
}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}

ES query ignoring time range filter

I have mimicked how kibana does a query search and have come up with the below query. Basically I'm looking for the lat 6 days of data (including those days where there is no data, since I need to feed it to a graph). But the returned buckets is giving me more than just those days. I woul like to understand where I'm going wring with this.
{
"version": true,
"size": 0,
"sort": [
{
"#timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"target_traffic": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d",
"time_zone": "Asia/Kolkata",
"min_doc_count": 0,
"extended_bounds": {
"min": "now-6d/d",
"max": "now"
}
},
"aggs": {
"days_filter": {
"filter": {
"range": {
"#timestamp": {
"gt": "now-6d",
"lte": "now"
}
}
},
"aggs": {
"in_bytes": {
"sum": {
"field": "netflow.in_bytes"
}
},
"out_bytes": {
"sum": {
"field": "netflow.out_bytes"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"#timestamp",
"netflow.first_switched",
"netflow.last_switched"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "( flow.src_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) OR flow.dst_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) ) AND flow.traffic_locality: \"private\"",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}
If you put the range filter inside your aggregation section without any date range in your query, what is going to happen is that your aggregations will run on all your data and metrics will be bucketed by day over all your data.
The range query on #timestamp should be moved inside the query section so as to compute aggregations only on the data you want, i.e. the last 6 days.

elasticsearch getting too many results, need help filtering query

I'm having much problem understanding the underlying of ES querying system.
I've got the following query for example:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"referer": "www.xx.yy.com"
}
},
{
"range": {
"#timestamp": {
"gte": "now",
"lt": "now-1h"
}
}
}
]
}
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
That request get too many results:
"status" : 500, "reason" :
"ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException:
Data too large, data for field [#timestamp] would be larger than limit
of [3200306380/2.9gb]]; nested:
UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException:
Data too large, data for field [#timestamp] would be larger than limit
of [3200306380/2.9gb]]; nested: CircuitBreakingException[Data too
large, data for field [#timestamp] would be larger than limit of
[3200306380/2.9gb]]; "
I've tryied that request:
{
"size": 0,
"filter": {
"and": [
{
"term": {
"referer": "www.geoportail.gouv.fr"
}
},
{
"range": {
"#timestamp": {
"from": "2014-10-04",
"to": "2014-10-05"
}
}
}
]
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
I would like to filter the data in order to be able to get a correct result, any help would be much appreciated!
I found a solution, it's kind of weird.
I've followed dimzak adviced and clear the cache:
curl --noproxy localhost -XPOST "http://localhost:9200/_cache/clear"
Then I used filtering instead of querying as Olly suggested:
{
"size": 0,
"query": {
"filtered": {
"query": {
"term": {
"referer": "www.xx.yy.fr"
}
},
"filter" : {
"range": {
"#timestamp": {
"from": "2014-10-04T00:00",
"to": "2014-10-05T00:00"
}
}
}
}
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
I cannot give you both the ansxwer, I think dimzak deserves it best, but thumbs up to you two guys :)
You can try clearing cache first and then execute the above query as shown here.
Another solution may be to remove interval or reduce time range in your query...
My best bet would be either clear cache first, or allocate more memory to elasticsearch (more here)
Using a filter would improve performance:
{
"size": 0,
"query": {
"filtered": {
"query": {
"term": {
"referer": "www.xx.yy.com"
}
},
"filter" : {"range": {
"#timestamp": { "gte": "now", "lt": "now-1h"
}
}
}
}
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
You may also find that date range is better than date histogram - you need to define the buckets yourself.
is the referer field being analysed? or do you want an exact match on this - if so set it to not_analyzed.
is there much cardinality in your hostname field? have you tried pre-hashing the values?

Query elasticsearch with multiple numeric ranges

{
"query": {
"filtered": {
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"filter": {
"numeric_range": {
"time_taken": {
"gte": 10
}
}
}
}
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
I have to run this query 20 times as i want to know notification times above each of the following thresholds- [10,30,60,120,240,300,600,1200..]. Right now, i am running a loop and making 20 queries for fetching this.
Is there a more sane way to query elasticsearch once and get ranges that fall into these thresholds respectively?
What you probably want is a "range aggregation".
Here is the possible query where you can add more range or alter them -
{
"size": 0,
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"aggs": {
"intervals": {
"range": {
"field": "time_taken",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100
}
]
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}

Resources