ES query ignoring time range filter - elasticsearch

I have mimicked how kibana does a query search and have come up with the below query. Basically I'm looking for the lat 6 days of data (including those days where there is no data, since I need to feed it to a graph). But the returned buckets is giving me more than just those days. I woul like to understand where I'm going wring with this.
{
"version": true,
"size": 0,
"sort": [
{
"#timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"_source": {
"excludes": []
},
"aggs": {
"target_traffic": {
"date_histogram": {
"field": "#timestamp",
"interval": "1d",
"time_zone": "Asia/Kolkata",
"min_doc_count": 0,
"extended_bounds": {
"min": "now-6d/d",
"max": "now"
}
},
"aggs": {
"days_filter": {
"filter": {
"range": {
"#timestamp": {
"gt": "now-6d",
"lte": "now"
}
}
},
"aggs": {
"in_bytes": {
"sum": {
"field": "netflow.in_bytes"
}
},
"out_bytes": {
"sum": {
"field": "netflow.out_bytes"
}
}
}
}
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"#timestamp",
"netflow.first_switched",
"netflow.last_switched"
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "( flow.src_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) OR flow.dst_addr: ( \"10.5.5.1\" OR \"10.5.5.2\" ) ) AND flow.traffic_locality: \"private\"",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}

If you put the range filter inside your aggregation section without any date range in your query, what is going to happen is that your aggregations will run on all your data and metrics will be bucketed by day over all your data.
The range query on #timestamp should be moved inside the query section so as to compute aggregations only on the data you want, i.e. the last 6 days.

Related

malformed bool query elasticsearch - Elasticsearch watcher

Hi I have the below elastic search query using this in dev tools. I keep getting errors for my bool query but it seems correct looking at #timestamp field and trying to only retrieve one day worth of data.
"input": {
"search": {
"request": {
"indices": [
"<iovation-*>"
],
"body": {
"size": 0,
"query": {
"bool": {
"must": {
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
},
"aggs": {
"percentiles": {
"percentiles": {
"field": "logstash.load.duration",
"percents": 95,
"keyed": false
}
},
"dates": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "5m",
"min_doc_count": 1
}
}
}
}
}
}
}
},
Any help is appreciated thanks!
There are few errors in your query
Whenever aggregation is used along with the query part, then the structure is
{
"query": {},
"aggs": {}
}
You are missing one } at the end of the query part
Calendar Intervals do not accept multiple quantities like 2d, 2m, etc.
If you have a fixed interval, then you can refer to the fixed_interval param
Modify your query as
{
"size": 0,
"query": {
"bool": {
"must": {
"range": {
"#timestamp": {
"gte": "now-1d"
}
}
}
} // note this
},
"aggs": {
"percentiles": {
"percentiles": {
"field": "logstash.load.duration",
"percents": 95,
"keyed": false
}
},
"dates": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "5m", // note this
"min_doc_count": 1
}
}
}
}

Elasticsearch single request to do Union query Top N

Not sure how to do SQL like union in Elasticsearch. I tried bool query but it doesn't meet my requirement yet. For example, the document structure is
{
"id": "123",
"authorId": 28,
"title": "Five Ways to Tap into...",
"byLine": "ashd jsabbdjs international",
"category": "Cat1"
}
I need to find top 5 matched "title" in each "category" when user types something. This can be done using multiple queries to Elasticsearch, but I was wondering if there are other ways to do it in one request.
Use an aggregation with top_hits sub-aggregation:
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}
Here is query which returns multi buckets based on "category"
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"authorId": [
1,
28
]
}
}
],
"should": [
{
"query_string": {
"query": "*int*",
"fields": [
"title^2",
"byLine^1"
]
}
}
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_5": {
"top_hits": {
"size": 5
}
}
}
}
}
}

Elasticsearch aggregation doesn't work with nested-type fields

I can't make elasticsearch aggregation+filter to work with nested fields. The data schema (relevant part) is like this:
"mappings": {
"rb": {
"properties": {
"project": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Essentially "rb" object contains a nested field called "project" which contains two more fields - "name" and "age". Query I'm running:
"aggs": {
"root": {
"aggs": {
"group": {
"aggs": {
"filtered": {
"aggs": {
"order": {
"percentiles": {
"field": "project.age",
"percents": ["50"]
}
}
},
"filter": {
"range": {
"last_updated": {
"gte": "2015-01-01",
"lt": "2015-07-01"
}
}
}
}
},
"terms": {
"field": "project.name",
"min_doc_count": 5,
"order": {
"filtered>order.50": "asc"
},
"shard_size": 10,
"size": 10
}
}
},
"nested": {
"path": "project"
}
}
}
This query is supposed to produce top 10 projects (project.name field) which match the date filter, sorted by their median age, ignoring projects with less than 5 mentions in the database. Median should be calculated only for projects matching the filter (date range).
Despite having more than a hundred thousands objects in the database, this query produces empty list. No errors, just empty response. I've tried it both on ES 1.6 and ES 2.0-beta.
I've re-organized your aggregation query a bit and I could get some results showing up. The main point is type since you are aggregating around a nested type, I took out the filter aggregation on the last_updated field and moved it up the hierarchy as the first aggregation. Then comes the nested aggregation on the project field and finally the terms and the percentile.
That seems to work out pretty well. Please try.
{
"size": 0,
"aggs": {
"filtered": {
"filter": {
"range": {
"last_updated": {
"gte": "2015-01-01",
"lt": "2015-07-01"
}
}
},
"aggs": {
"root": {
"nested": {
"path": "project"
},
"aggs": {
"group": {
"terms": {
"field": "project.name",
"min_doc_count": 5,
"shard_size": 10,
"order": {
"order.50": "asc"
},
"size": 10
},
"aggs": {
"order": {
"percentiles": {
"field": "project.age",
"percents": [
"50"
]
}
}
}
}
}
}
}
}
}
}

elasticsearch getting too many results, need help filtering query

I'm having much problem understanding the underlying of ES querying system.
I've got the following query for example:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"referer": "www.xx.yy.com"
}
},
{
"range": {
"#timestamp": {
"gte": "now",
"lt": "now-1h"
}
}
}
]
}
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
That request get too many results:
"status" : 500, "reason" :
"ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException:
Data too large, data for field [#timestamp] would be larger than limit
of [3200306380/2.9gb]]; nested:
UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException:
Data too large, data for field [#timestamp] would be larger than limit
of [3200306380/2.9gb]]; nested: CircuitBreakingException[Data too
large, data for field [#timestamp] would be larger than limit of
[3200306380/2.9gb]]; "
I've tryied that request:
{
"size": 0,
"filter": {
"and": [
{
"term": {
"referer": "www.geoportail.gouv.fr"
}
},
{
"range": {
"#timestamp": {
"from": "2014-10-04",
"to": "2014-10-05"
}
}
}
]
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
I would like to filter the data in order to be able to get a correct result, any help would be much appreciated!
I found a solution, it's kind of weird.
I've followed dimzak adviced and clear the cache:
curl --noproxy localhost -XPOST "http://localhost:9200/_cache/clear"
Then I used filtering instead of querying as Olly suggested:
{
"size": 0,
"query": {
"filtered": {
"query": {
"term": {
"referer": "www.xx.yy.fr"
}
},
"filter" : {
"range": {
"#timestamp": {
"from": "2014-10-04T00:00",
"to": "2014-10-05T00:00"
}
}
}
}
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
I cannot give you both the ansxwer, I think dimzak deserves it best, but thumbs up to you two guys :)
You can try clearing cache first and then execute the above query as shown here.
Another solution may be to remove interval or reduce time range in your query...
My best bet would be either clear cache first, or allocate more memory to elasticsearch (more here)
Using a filter would improve performance:
{
"size": 0,
"query": {
"filtered": {
"query": {
"term": {
"referer": "www.xx.yy.com"
}
},
"filter" : {"range": {
"#timestamp": { "gte": "now", "lt": "now-1h"
}
}
}
}
},
"aggs": {
"interval": {
"date_histogram": {
"field": "#timestamp",
"interval": "0.5h"
},
"aggs": {
"what": {
"cardinality": {
"field": "host"
}
}
}
}
}
}
You may also find that date range is better than date histogram - you need to define the buckets yourself.
is the referer field being analysed? or do you want an exact match on this - if so set it to not_analyzed.
is there much cardinality in your hostname field? have you tried pre-hashing the values?

ElasticSearch - significant term aggregation with range

I am interested to know how can I add a range for a significant terms aggregations query. For example:
{
"query": {
"terms": {
"text_content": [
"searchTerm"
]
},
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
},
"aggregations": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
will not work. Any suggestions on how to specify the range?
Instead of using a range query, use a range filter as the relevance/score doesn't seem to matter in your case.
Then, in order to combine your query with a range filter, you should use a filtered query (see documentation).
Try something like this :
{
"query": {
"filtered": {
"query": {
"terms": {
"text_content": [
"searchTerm"
]
}
},
"filter": {
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
}
}
},
"aggs": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
Hope this helps!

Resources