Document count aggregation via query in Elasticsearch (like facet.query in solr) - elasticsearch

I have a main query and i need the number of matches for a couple of sub-queries.
In solr words I need a facet.query. What I am missing is a simple doc_count aggregation like the value_count aggregation.
Any suggestions?
I found two possible solutions which I do not like:
Use filter aggregation with value_count metric on _id:
example:
GET _search
{
"query": {
"match_main": {}
},
"aggs": {
"facetvalue1": {
"filter": {
"bool": {
"should": [
{"match": { "name": "fred" }},
{"term": { "lastname": "krueger" }}
]
}
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
},
"facetvalue2": {
"filter": {
"term": { "name": "freddy" }
},
"aggs": {
"count": {
"value_count": {
"field": "_id"
}
}
}
}
}
}
Use Multi Search API
example:
GET _msearch
{"index":"myindex"}
{"query":{"match_main": {}}}
{"index":"myindex"}
{"size": 0, "query":{"match_main": {}}, "filter": {"bool": {"should":[{"match": { "name": "fred" }},{"term": { "lastname": "krueger" }}]}}}
{"index":"myindex"}
{"size": 0, "query":{"match_main": {}},"filter": {"term": { "name": "freddy" }}}
I see that solution 2 is faster but imagine match_main as complex query!
So I would prefer solution 1 if there would be an doc_count:{} instead of value_count:{"field":"_id"}.
But back to my basic question: what is the counterpart of the solr facet.query in elasticsearch?

You can use a filters aggregation for this. Note the additional s, that is different from the filter aggregation you already mentioned.
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"values": {
"filters": {
"filters": {
"value1": {
"bool": {
"should": [
{
"match": {
"name": "fred"
}
},
{
"term": {
"lastname": "krueger"
}
}
]
}
},
"value2": {
"term": {
"name": "freddy"
}
}
}
}
}
}
}
This will return something like
"aggregations": {
"values": {
"buckets": {
"value1": {
"doc_count": 4
},
"value2": {
"doc_count": 1
}
}
}
}
Edit: As a general note, you don't have to use a metric aggregation on your bucket aggregations. If you don't provide any subaggregations, you will just get the document count. In this case, filters will provide the buckets, but multiple filter aggregations should work as well.

Related

Elasticsearch how can perform a "TERMS" AND "RANGE" query together

In elasticsearch, I am working well with Terms query to search multiple ID in one query,
my original terms query
{
"query": {
"terms": {
"Id": ["134","156"],
}
}
}
however, I need to add an extra condition like the following:
{
"query": {
"terms": {
"id": ["163","121","569","579"]
},
"range":{
"age":
{"gt":10}
}
}
}
the "id" field can be a long array.
You can combine both the queries using bool query
{
"query": {
"bool": {
"must": [
{
"terms": {
"Id": [
"134",
"156"
]
}
},
{
"range": {
"age": {
"gt": 10
}
}
}
]
}
}
}

Elastic search 2.1 : Intersection of aggregations

I have some sample data in elastic search, which looks like below
Data1: {
"name": "rahul",
"socialnetwork": "facebook",
"day":1
}Data2: {
"name": "rahul",
"searchengine": "google"
"day": 1
}Data3: {
"name": "vivek",
"socialnetwork": "facebook",
"day":1
}Data4: {
"name": "devendra",
"searchengine": "google",
"day":2
}Data5: {
"name": "rahul",
"socialnetwork": "facebook",
"day":2
}
I need to get aggregations on "name" field, where socialnetwork = "facebook" and searchengine = "google".
As far as I know, we can use two aggregations and get an intersection of aggregations.
1st aggregation :
{
"query": {
"match": {
"searchengine": "google"
}
},
"aggs": {
"searcheng": {
"terms": {
"field": "name"
}
}
}
}
2nd aggregation :
{
"query": {
"match": {
"socialnetwork": "facebook"
}
},
"aggs": {
"socialnet": {
"terms": {
"field": "name"
}
}
}
}
And get the common aggregations (i.e. intersection) from both the aggregations.
But I am not able to get intersection using elastic search.
I have tried many things: subaggregations doesn't help in this case, significant terms aggregations results are not good enough, filters, pipeline aggregations, but couldn't find a solution.
Above sample data is just a simplified version of a big data, there are more than two filters, around 20 filters.
No,you dont need to have intersection of two aggregations.
The above can be easily achieved using bool query.For your desired output you can use should clause.
{
"query": {
"bool": {
"should": [
{
"match": {
"searchengine": "google"
}
},
{
"match": {
"socialnetwork": "facebook"
}
}
],
"minimum_number_should_match": 1
}
},
"aggs": {
"searcheng": {
"terms": {
"field": "name",
"min_doc_count" :2
}
}
}
}
Hope it helps.

How to use ElasticSearch to bucket historical data from midnight to now?

So I have an index with timestamps in the following format:
2015-03-20T12:00:00+0500
What I would like to do in the SQL equivalent is the following:
select date(timestamp), sum(orders)
from data
where time(timestamp) < time(now)
group by date(timestamp)
I know I need an aggregation but, for now, I've tried a basic search query below but I'm getting a malformed error:
{
"size": 0,
"query":
{
"filtered":
{
"query":
{
"match_all" : {}
},
"filter":
{
"range":
{
"#timestamp":
{
"from": "00:00:01.000",
"to": "15:00:00.000"
}
}
}
}
}
}
You do indeed want an aggregation, specifically the date histogram aggregation. Something like
{
"query": {"match_all": {}},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"order_sum": {
"sum": {"field": "foo"}
}
}
}
}
}
First you have a bucketing aggregation that groups your documents by date, then inside that a metric aggregation that computes a value (in this case a sum) for each bucket
which would return data of the form
{
...
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "2015-03-01T00:00:00.000Z",
"key": 1425168000000,
"doc_count": 8644,
"order_sum": {
"value": 1234
}
},
{
"key_as_string": "2015-03-02T00:00:00.000Z",
"key": 1425254400000,
"doc_count": 8819,
"order_sum": {
"value": 45678
}
},
...
]
}
}
}
There is a good intro to aggregations on the elasticsearch blog (part 1 and part 2) if you want to do some more reading.

Using aggregation with filters in elastic search

I have an elastic search running with documents like this one:
{
id: 1,
price: 620000,
propertyType: "HO",
location: {
lat: 51.41999,
lon: -0.14426
},
active: true,
rentOrSale: "S",
}
I'm trying to use aggregates to get statistics about a certain area using aggregations and the query I'm using is the following:
{
"sort": [
{
"id": "desc"
}
],
"query": {
"bool": {
"must": [
{
"term": {
"rentOrSale": "s"
}
},
{
"term": {
"active": true
}
}
]
},
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "15.0mi",
"location": {
"lat": 51.50735,
"lon": -0.12776
}
}
}
]
}
}
},
"aggs": {
"propertytype_agg": {
"terms": {
"field": "propertyType"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
},
"bed_agg": {
"terms": {
"field": "numberOfBedrooms"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
But in the result I can't see the aggregations. As soon as I remove either the bool or filtered part of the query I can see the aggregations. I can't figure out why this is happening, nor how do I get the aggregations for these filters. I've tried using the answer to this question but I've not been able to solve it. Any ideas?
I think your query need to be slightly re-arranged - move the "filtered" further up and repeat the "query" command:
"query": {
"filtered": {
"query" : {
"bool": {
...
}
},
"filter": {
...
}
}
}

Multiple filters and an aggregate in elasticsearch

How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{
"filter": {
"and": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398169707,
"to": 1400761707
}
}
}
]
},
"size": 0,
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by #alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_type": "logs"
}
},
{
"term": {
"dc": "eu-west-12"
}
},
{
"term": {
"status": "204"
}
},
{
"range": {
"#timestamp": {
"from": 1398176502000,
"to": 1400768502000
}
}
}
]
}
},
"aggs": {
"time_histo": {
"date_histogram": {
"field": "#timestamp",
"interval": "1h"
},
"aggs": {
"name": {
"percentiles": {
"field": "upstream_response_time",
"percents": [
98.0
]
}
}
}
}
}
}
},
"size": 0
}
Put your filter in a filtered-query.
The top-level filter is for filtering search hits only, and not facets/aggregations. It was renamed to post_filter in 1.0 due to this quite common confusion.
Also, you might want to look into this post on why you often want to use bool and not and/or: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
more on #geekQ 's answer: to support filter string with space char,for multipal term search,use below:
{ "aggs": {
"aggresults": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"term_1": "some text with space 1"
}
},
{
"match_phrase": {
"term_2": "some text with also space 2"
}
}
]
}
},
"aggs" : {
"all_term_3s" : {
"terms" : {
"field":"term_3.keyword",
"size" : 10000,
"order" : {
"_term" : "asc"
}
}
}
}
} }, "size": 0 }
Just for reference, as for the version 7.2, I tried with something as follows to achieve multiple filters for aggregation:
filter aggregation to filter for aggregation
use bool to set up the compound query
POST movies/_search?size=0
{
"size": 0,
"aggs": {
"test": {
"filter": {
"bool": {
"must": {
"term": {
"genre": "action"
}
},
"filter": {
"range": {
"year": {
"gte": 1800,
"lte": 3000
}
}
}
}
},
"aggs": {
"year_hist": {
"histogram": {
"field": "year",
"interval": 50
}
}
}
}
}
}

Resources