Multiple filters and an aggregate in elasticsearch - elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.

I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}

Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/

more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }

Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Related

ElasticSearch query with prefix for aggregation

I am trying to add a prefix condition for my ES query in a "must" clause.
My current query looks something like this:
body = {
"query": {
"bool": {
"must":
{ "term": { "article_lang": 0 }}
,
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I need to add a mandatory condition to my query to filter articles whose id starts with "article-".
So, far I have tried this:
{
"query": {
"bool": {
"should": [
{ "term": { "article_lang": 0 }},
{ "prefix": { "article_id": {"value": "article-"} }}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I am fairly new to ES and from the documentations online, I know that "should" is to be used for "OR" conditions and "must" for "AND". This is returning me some data but as per the condition it will be consisting of either article_lang=0 or articles starting with article-. When I use "must", it doesn't return anything.
I am certain that there are articles with id starting with this prefix because currently, we are iterating through this result to filter out such articles. What am I missing here?
In your prefix query, you need to use the article_id.keyword field, not article_id. Also, you should prefer filter over must since you're simply doing yes/no matching (aka filters)
{
"query": {
"bool": {
"filter": [ <-- change this
{
"term": {
"article_lang": 0
}
},
{
"prefix": {
"article_id.keyword": { <-- and this
"value": "article-"
}
}
}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}

Elasticsearch: Aggregation on filtered nested objects to find unique values

I have an array of objects (tags) in each document in Elasticsearch 5:
{
"tags": [
{ "key": "tag1", "value": "val1" },
{ "key": "tag2", "value": "val2" },
...
]
}
Now I want to find unique tag values for a certain tag key. Something similiar to this SQL query:
SELECT DISTINCT(tags.value) FROM tags WHERE tags.key='some-key'
I have came to this DSL so far:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter" : { "terms": { "tags.key": "tag1" } },
"aggs": {
"my_tags_values": {
"terms" : {
"field" : "tags.value",
"size": 9999
}
}
}
}
}
}
}
But It is showing me this error:
[terms] unknown field [tags.key], parser not found.
Is this the right approach to solve the problem? Thanks for your help.
Note: I have declared the tags field as a nested field in my mapping.
You mixed up things there. You wanted probably to add a filter aggregation, but you didn't give it any name:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"my_filter": {
"filter": {
"terms": {
"tags.key": [
"tag1"
]
}
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 9999
}
}
}
}
}
}
}
}
Try Bool Query inside the Filter-Aggregation:
{
"size": 0,
"aggs": {
"my_tags": {
"nested": {
"path": "tags"
},
"aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"tags.key": "tag1"
}
}
]
},
"aggs": {
"my_tags_values": {
"terms": {
"field": "tags.value",
"size": 0
}
}
}
}
}
}
}
}
BTW: if you want to retrieve all buckets, you can write 0 instead of 9999 in aggregation size.

ElasticSearch aggregations using filter and without it

I`m building product list page with filters. There a lot of filters, and data for them are counting in ES with aggregation functions.
Simplest example if min/max price:
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
}
}
}
So, this request in ES return me minimal and maximal price according rules installed in filter (category_id 36898, shop_id 44 etc).
It is working perfect.
The question is: is it possible to update this request and get aggregations without filters? Or is it maybe possible to return aggregation data with another filter in one request?
So I want:
min_price and max_price for filtered data (query1)
and mix_price and max_price for unfiltered data (or filtered data with query 2)?
You can use global option for the aggregations to not applying any filters provided in query block.
For example, for your query use the following json input.
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"shop_id": 44
}
},
{
"term": {
"CategoryId": 36898
}
},
{
"term": {
"products_status": 1
}
},
{
"term": {
"availability": 3
}
}
]
}
}
}
},
"aggs": {
"min_price": {
"min": {
"field": "products_price"
}
},
"max_price": {
"max": {
"field": "products_price"
}
},
"without_filter_min": {
"global": {},
"aggs": {
"price_value": {
"min": {
"field": "products_price"
}
}
}
},
"without_filter_max": {
"global": {},
"aggs": {
"price_value": {
"max": {
"field": "products_price"
}
}
}
}
}
}

Using aggregation with filters in elastic search

I have an elastic search running with documents like this one:
{
id: 1,
price: 620000,
propertyType: "HO",
location: {
lat: 51.41999,
lon: -0.14426
},
active: true,
rentOrSale: "S",
}
I'm trying to use aggregates to get statistics about a certain area using aggregations and the query I'm using is the following:
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"must": [
{
"term": {
"rentOrSale": "s"
}
},
{
"term": {
"active": true
}
}
]
},
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "15.0mi",
"location": {
"lat": 51.50735,
"lon": -0.12776
}
}
}
]
}
}
},
"aggs": {
"propertytype_agg": {
"terms": {
"field": "propertyType"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
},
"bed_agg": {
"terms": {
"field": "numberOfBedrooms"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
But in the result I can't see the aggregations. As soon as I remove either the bool or filtered part of the query I can see the aggregations. I can't figure out why this is happening, nor how do I get the aggregations for these filters. I've tried using the answer to this question but I've not been able to solve it. Any ideas?
I think your query need to be slightly re-arranged - move the "filtered" further up and repeat the "query" command:
"query": {
"filtered": {
"query" : {
"bool": {
...
}
},
"filter": {
...
}
}
}

ElasticSearch - significant term aggregation with range

I am interested to know how can I add a range for a significant terms aggregations query. For example:
{
"query": {
"terms": {
"text_content": [
"searchTerm"
]
},
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
},
"aggregations": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
will not work. Any suggestions on how to specify the range?
Instead of using a range query, use a range filter as the relevance/score doesn't seem to matter in your case.
Then, in order to combine your query with a range filter, you should use a filtered query (see documentation).
Try something like this :
{
"query": {
"filtered": {
"query": {
"terms": {
"text_content": [
"searchTerm"
]
}
},
"filter": {
"range": {
"dateField": {
"from": "date1",
"to": "date2"
}
}
}
}
},
"aggs": {
"significantQTypes": {
"significant_terms": {
"field": "field1",
"size": 10
}
}
},
"size": 0
}
Hope this helps!

Resources