I have a query like that:
https://pastebin.com/9YK6WxEJ
this gives me:
https://pastebin.com/ranpCnzG
Now, the buckets are fine but I want to get the documents' data grouped by bucket name, not just their count in doc_count. Is there any way to do that?
Maybe this works for you?
"aggs": {
"rating_ranges": {
"range": {
"field": "AggregateRating",
"keyed": true,
"ranges": [
{
"key": "bad",
"to": 3
},
{
"key": "average",
"from": 3,
"to": 4
},
{
"key": "good",
"from": 4
}
]
},
"aggs": {
"hits": {
"top_hits": {
"size": 100,
"sort": [
{
"AggregateRating": {
"order": "desc"
}
}
]
}
}
}
}
}
Related
I have a list of products (deal entities) and I'm attempting to create a bucket aggregation by categories, ordered by the sum of available_stock.
This all works fine, but I want to exclude such categories from the resulting aggregation that don't have level set to 1 (In other words, I only want to keep aggregations on category where level IS 1).
I am aware that elasticsearch provides "exclude" and "include" parameters, but these only work on the same field I'm aggregating on (deal.category.id in this case)
This is my sample deal document:
{
"_source": {
"id": 392745,
"category": [
{
"id": 17575,
"level": 2
},
{
"id": 17574,
"level": 1
},
{
"id": 17572,
"level": 0
}
],
"stats": {
"available_stock": 500
}
}
}
And this would be the query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
}
},
"aggs": {
"mainAggregation": {
"terms": {
"field": "deal.category.id",
"order": {
"available_stock": "desc"
},
"size": 3
},
"aggs": {
"available_stock": {
"sum": {
"field": "deal.stats.available_stock"
}
}
}
}
},
"size": 0
}
And my resulting aggregation, sadly including category 17572 with level 0.
{
"aggregations": {
"mainAggregation": {
"buckets": [
{
"key": 17572,
"doc_count": 30,
"available_stock": {
"value": 24000
}
},
{
"key": 17598,
"doc_count": 10,
"available_stock": {
"value": 12000
}
},
{
"key": 17602,
"doc_count": 8,
"available_stock": {
"value": 6000
}
}
]
}
}
}
P.S.: Currently on ElasticSearch 1.6
Update 1: Still stuck on the problem after various experiments with various combimation of subaggregations.
I have found this impossible to solve and decided to go with two separate queries.
I'm using Elastica's query builder to create queries for ElasticSearch (version 5.3)
I've around 1600 documents indexed in a particular index and type,
When I perform a search in that index with a empty string in query, I only get around 440 hits,
The generated query is:
{
"query": {
"bool": {
"should": [{
"multi_match": {
"query": "",
"fields": ["<field_1>^5", "<field_2>^4", "<field_3>^1", "<field_4>^2"],
"fuzziness": "AUTO"
}
}]
}
},
"from": 0,
"size": 20,
"aggs": {
"<agg_name_1>": {
"terms": {
"field": "<agg_field_1>"
}
},
"<agg_name_2>": {
"terms": {
"field": "<agg_field_2>"
}
},
"<agg_name_3>": {
"terms": {
"field": "<agg_field_3>"
}
},
"<agg_name_4>": {
"terms": {
"field": "<agg_field_4>"
}
},
"<agg_name_5>": {
"terms": {
"field": "<agg_field_5>"
}
},
"<date_agg_name>": {
"date_range": {
"field": "<agg_field_date_1>",
"keyed": true,
"ranges": [{
"from": "now\/d",
"key": "NOW\/DAY TO *"
}, {
"from": "now-2d\/d",
"key": "NOW\/DAY-2DAY TO *"
}, {
"from": "now-7d\/d",
"key": "NOW\/DAY-7DAY TO *"
}, {
"from": "now-30d\/d",
"key": "NOW\/DAY-30DAY TO *"
}]
}
},
"<agg_name_integer>": {
"range": {
"field": "<agg_field_integer>",
"keyed": true,
"ranges": [{
"to": 1,
"key": "1"
}, {
"to": 2,
"key": "2"
}, {
"to": 3,
"key": "3"
}, {
"to": 4,
"key": "4"
}, {
"to": 5,
"key": "5"
}]
}
}
}
}
I thought that since the query is empty string, it should match all the documents, but why is it only matching a subset of the document? I also tried changing should with must, but there was no difference.
Is it because of multi_match ? or fuzziness ? or fields ?
P. S. The actual name of fields are changed and replaced with placeholder.
If the query field is empty you can use match_all query
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-all-query.html
multi match does not work with empty keywords as far as i have noticed.
We are performing 3 level aggregation for a certain date range we require fetching the distinct "Website" name grouped by distinct "HitCount" vale grouped by "DateTime" intervals. Here, histogram aggregation allows us to fetch the interval based documents, however the "key_as_string" of date is always considered from 12 AM instead of the date range time provided in the query. Depending on the interval period value, the day (24 hrs starting from 12 AM of the from time) is divided and aggregation output is given.
For e.g. we have given the from time as "2015-11-10T11:00:00" and To time as "2015-11-13T11:00:00" with interval of 8 hrs
Following is the query used:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"DateTime": {
"from": "2015-11-10T11:00:00",
"to": "2015-11-13T11:00:00"
}
}
}
]
}
}
}
},
"aggs": {
"Website": {
"terms": {
"field": "Website",
"size": 0,
"order": {
"_count": "desc"
}
},
"aggs": {
"HitCount": {
"terms": {
"field": "HitCount",
"size": 0,
"order": {
"_count": "desc"
}
},
"aggs": {
"DateTime": {
"date_histogram": {
"field": "DateTime",
"interval": "8h",
"min_doc_count": 0,
"extended_bounds": {
"min": 1447153200000,
"max": 1447412400000
}
}
}
}
}
}
}
}
}
The query Output wrt 3rd level DateTime aggregation is:
"DateTime": {
"buckets": [
{
"key_as_string": "2015-11-10T08:00:00.000Z",
"key": 1447142400000,
"doc_count": 62698
}
,
{
"key_as_string": "2015-11-10T16:00:00.000Z",
"key": 1447171200000,
"doc_count": 248118
}
,
{
"key_as_string": "2015-11-11T00:00:00.000Z",
"key": 1447200000000,
"doc_count": 224898
}
,
{
"key_as_string": "2015-11-11T08:00:00.000Z",
"key": 1447228800000,
"doc_count": 221663
}
,
{
"key_as_string": "2015-11-11T16:00:00.000Z",
"key": 1447257600000,
"doc_count": 220935
}
,
{
"key_as_string": "2015-11-12T00:00:00.000Z",
"key": 1447286400000,
"doc_count": 219340
}
,
{
"key_as_string": "2015-11-12T08:00:00.000Z",
"key": 1447315200000,
"doc_count": 218452
}
,
{
"key_as_string": "2015-11-12T16:00:00.000Z",
"key": 1447344000000,
"doc_count": 190
}
,
{
"key_as_string": "2015-11-13T00:00:00.000Z",
"key": 1447372800000,
"doc_count": 0
}
,
{
"key_as_string": "2015-11-13T08:00:00.000Z",
"key": 1447401600000,
"doc_count": 0
}
]
}
Expected Output:
Here, we would expect the intervals to be divided and queried as:
2015-11-10T11:00:00 to 2015-11-10T19:00:00
2015-11-10T19:00:00 to 2015-11-11T03:00:00
2015-11-11T03:00:00 to 2015-11-11T11:00:00
2015-11-11T11:00:00 to 2015-11-11T19:00:00
2015-11-11T19:00:00 to 2015-11-12T03:00:00
2015-11-12T03:00:00 to 2015-11-12T11:00:00
2015-11-12T11:00:00 to 2015-11-12T19:00:00
2015-11-12T19:00:00 to 2015-11-13T03:00:00
2015-11-13T03:00:00 to 2015-11-13T11:00:00
ie. the "key_as_string" output value should be 2015-11-10T11:00:00, 2015-11-10T19:00:00, .... and so on
The above is required as we have given a From & to time of 11 AM so that it can be a updated value of every 8 hrs whenever we fire the query rather than getting a fixed range of time for the whole day.
Note: ES 1.7 is used
The documentation explains that you can use the offset parameter.
So
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"DateTime": {
"from": "2015-11-10T11:00:00",
"to": "2015-11-13T11:00:00"
}
}
}
]
}
}
}
},
"aggs": {
"Website": {
"terms": {
"field": "Website",
"size": 0,
"order": {
"_count": "desc"
}
},
"aggs": {
"HitCount": {
"terms": {
"field": "HitCount",
"size": 0,
"order": {
"_count": "desc"
}
},
"aggs": {
"DateTime": {
"date_histogram": {
"field": "DateTime",
"interval": "8h",
"min_doc_count": 0,
"offset": "+11h"
}
}
}
}
}
}
}
}
I came across a confusion in elasticsearch (version : 1.7.1). As per documentation https://www.elastic.co/guide/en/elasticsearch/guide/current/_filtering_queries_and_aggregations.html ,a filter applied to the query will also be applied to aggregation. When I issued the following query, I am getting unexpected results.
{
"aggregations": {
"outer": {
"aggregations": {
"inner": {
"date_histogram": {
"extended_bounds": {
"min": 0
},
"field": "time",
"interval": "30d",
"min_doc_count": 0,
"order": {
"_key": "desc"
}
}
}
},
"terms": {
"field": "ad_id",
"size": 10
}
}
},
"query": {
"filtered": {
"filter": {
"and": {
"filters": [
{
"range": {
"time": {
"from": 1441619173000,
"include_lower": false,
"include_upper": true,
"to": 1442835370000
}
}
}
]
}
}
}
}
}
A portion of result is here.
{
"buckets": [
{
"key": 203737,
"doc_count": 27,
"inner": {
"buckets": [
{
"key_as_string": "2015-09-02T00:00:00.000Z",
"key": 1441152000000,
"doc_count": 27
},
{
"key_as_string": "1970-01-31T00:00:00.000Z",
"key": 2592000000,
"doc_count": 0
},
...
{
"key_as_string": "1970-01-01T00:00:00.000Z",
"key": 0,
"doc_count": 0
}
]
}
}
]
}
Please note that the aggregation result includes keys outside the range I have applied. Type of the time field is date. I have also tried the following query, but the result was same.
{
"aggs": {
"outer_filter": {
"filter": {
"and": {
"filters": [
{
"range": {
"time": {
"from": 1441619173000,
"include_lower": false,
"include_upper": true,
"to": 1442835370000
}
}
}
]
}
},
"aggs": {
"outer_term": {
"terms": {
"field": "ad_id",
"size": 10
},
"aggs": {
"inner": {
"date_histogram": {
"extended_bounds": {
"min": 0
},
"field": "time",
"interval": "30d",
"min_doc_count": 0,
"order": {
"_key": "desc"
}
}
}
}
}
}
}
}
}
My problem is that the aggregation result includes results outside the filters ("from": 1441619173000,"to": 1442835370000).
Why are the filters not getting applied ?
Can anyone help please.
'extended_bound' min value is the problem. As min is 0 and the field is of type date, buckets starts from 1970 itself.
You appear to have the range filter confused with the range aggregation.
The range filter takes two types of parameters, gte or gt (greater than) and lte or lt (less than).
The from/to parameters are for the range aggregation, which is used to split your results into user defined buckets.
Is there a way to simplify and optimize the following query:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"range": {
"ts": {
"gte": "2014-12-18",
"lte": "2014-12-18"
}
}
}
]
},
"query": {
"match": {
"track_events.event": "render"
}
}
}
},
"aggs": {
"per_type": {
"terms": {
"field": "type",
"order": {
"_count": "desc"
},
"size": 0
},
"aggs": {
"per_hour": {
"terms": {
"script": "(doc[\"track_events.ts\"].value - doc[\"ts\"].value)/(1000 * 3600)",
"order": {
"_count": "desc"
},
"size": 0
}
}
}
}
}
}
The index in elasticsearch contains documents with fields track_events.ts and ts. The purpose is to count how many occurances exist in the hourly intervals between track_events.ts and ts.
Example response:
"buckets": [{
"key": "0",
"doc_count": 67736997
},
{
"key": "1",
"doc_count": 7193214
},
{
"key": "2",
"doc_count": 3406966
},
{
"key": "3",
"doc_count": 1988135
}]
}
which means that 67736997 counts where found that have time difference less than 1 hour, 7193214 counts with time difference less than 2 hours, etc.
The biggest performance gain would be to replace the script.
i.e. instead of doing:
(doc[\"track_events.ts\"].value - doc[\"ts\"].value)/(1000 * 3600)
pre-calculate this value when loading the data into Elasticsearch and put it into another field. Then do the term aggregation on this field instead.