Elasticsearch aggregation using a bool filter - elasticsearch

I've the following query which works fine on Elasticsearch 1.x but does not work on 2.x (I get doc_count: 0) since the bool filter has been deprecated. It's not quite clear to me how to re-write this query using the new Bool Query.
{
"aggregations": {
"events_per_period": {
"filter": {
"bool": {
"must": [
{
"terms": {
"message.facility": [
"facility1",
"facility2",
"facility3"
]
}
}
]
}
}
}
},
"size": 0
}
Any help is greatly appreciated.

I think you might want aggregation on multi fields with filter :-
Here I assume filter for id and aggregation on facility1 and facility2 .
{
"_source":false,
"query": {
"match": {
"id": "value"
}
},
"aggregations": {
"byFacility1": {
"terms": {
"field": "facility1"
},
"aggs": {
"byFacility2": {
"terms": {
"field": "facility2"
}
}
}
}
}
}
if you want aggregation on three field , check link.
For java implementation link2

Related

Elasticsearch aggregations significant_text without query block returns zero buckets

I want to learn elasticsearch and I am following this guide:
https://github.com/LisaHJung/Part-2-Understanding-the-relevance-of-your-search-with-Elasticsearch-and-Kibana-
This command worked correctly as described in the guide, it will return buckets with significant_texts:
GET news_headlines/_search
{
"query": {
"match": {
"category": "ENTERTAINMENT"
}
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
I thought I'd explore by trying to find significant_text against ALL documents in my index. But both these attempts gave my zero bucketed items:
GET news_headlines/_search
{
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
GET news_headlines/_search
{
"query": {
"match_all": { }
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
What did I do wrong? Or is there something about aggregations that I don't understand?

Filter out terms aggregation buckets in elasticsearch after applying aggregation

Below is snapshot of the dataset:
recordNo employeeId employeeStatus employeeAddr
1 employeeA Permanent
2 employeeA ABC
3 employeeB Contract
4 employeeB CDE
I want to get the list of employees along with employeeStatus and employeeAddr.
So I am using terms aggregation on employeeId and then using sub-aggregations of employeeStatus and employeeAddr to get these details.
Below query returns the results correctly.
{
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
}
}
}
}
}
Now I want only the employees which are in Permanent status. So I am applying filter aggregation.
{
"aggregations": {
"filter_Employee_employeeID": {
"filter": {
"bool": {
"must": [
{
"match": {
"employeeStatus": {"query": "Permanent"}
}
}
]
}
},
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
}
}
}
}
}
}
}
Now the problem is that the employeeAddr aggregation returns no buckets for employeeA because record 2 gets filtered out before the aggregation is done.
Assuming that I cannot modify the data set and I want to achieve the result with a single elastic query, how can I do it?
I checked the Bucket Selector pipeline aggregation but it only works for metric aggregations.
Is there a way to filter out term buckets after the aggregation is applied?
If I understood correctly you want to preserve the aggregations even if you use some kind of filter. To achieve that, try using the post_filter clause.
You can check the docs here
The clause is applied "outside" the aggregation. Using your example, it should look like this:
{
"aggregations": {
"filter_Employee_employeeID": {
"aggregations": {
"Employee": {
"terms": {
"field": "employeeID"
},
"aggregations": {
"employeeStatus": {
"terms": {
"field": "employeeStatus"
}
},
"employeeAddr": {
"terms": {
"field": "employeeAddr"
}
}
}
}
}
}
},
"post_filter": {
"bool": {
"must": [
{
"match": {
"employeeStatus": {
"query": "Permanent"
}
}
}
]
}
}
}
I tested a combination of the include field for the terms aggregation, plus using a bucket_selector with document count would give you the desired result.
Filtering term values is here.
Bucket selector using document count is here
the subtlety here is that, yes you need numeric values, but you can also reference meta/custom fields that elasticsearch has
{
"aggregations": {
"Employee": {
"terms": {
"field": "employeeId.keyword"
},
"aggregations": {
"employeeStatus": {
"terms": {"field": "employeeStatus", "include": "Permanent"}
},
"employeeAddr": {
"terms": {"field": "employeeAddr"}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "employeeStatus._bucket_count"
},
"script": {
"source": "params.count != 0"
}
}
}
}
}
}
}
I tested this on 7.10 and it worked, returning only employeeA, with the address included.

Elasticsearch scoped aggregation not desired results

I have the following query but the aggregation doesn't seem to be acting on top of the query.
The query returns 3 results there are 10 items in the aggregation. Looks like the aggregation is acting on top of all queried results.
Basically, how do I get the aggregation to take the given query as the input?
{
"query": {
"filtered": {
"filter": {
"and": [
{
"geo_distance": {
"coordinates": [
-79.3931,
43.6709
],
"distance": "15km"
}
},
{
"term": {
"user.type": "2"
}
}
]
},
"query": {
"match": {
"user.shoes": "314"
}
}
}
},
"aggs": {
"dedup": {
"terms": { "field": "user.id" }
"aggs": {
"dedup_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
}
So as it turns out, I was expecting the aggregation to act on the paginated results given by the query. And that's incorrect.
The aggregation takes as input "all results" of the query, not just the paginated one.

Elasticsearch Aggregation Scope Issue

I have an Elasticsearch Index with more than 100 Millions of records.
If I run below query then response comes (1 record) within 1 second
{
"query": {
"bool": {
"must":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
}
}
But if I run below query then response comes in more than 2 minutes.
{
"query": {
"bool": {
"must":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
},
"aggs": {
"mywordcloud": {
"terms": {
"field": "post.content_terms"
}
}
}
}
I don't know why it is taking so much time after adding aggregation on the top of the query where _id = a36403af960840b86452bf1a6bd42fde3b4773e0 which matches with only 1 record.
As per my assumption elastic search is applying aggregation on the output of the data. So technically it should run aggregation on 1 record and response must come within 1 second too almost same as without using aggs.
How to fix this issue?
I am using Elastic Search Version 1.5
It's a good example where you need to consider choosing filter context over query context.
Try running the same query using filter as shown below:
GET my-index/_search
{
"query": {
"bool": {
"filter":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
},
"aggs": {
"mywordcloud": {
"terms": {
"field": "post.content_terms"
}
}
}
}
My first suggestion
is that to upgrade :-
I have tried your second query in 1.7.2 , it very fast . I think upgradation will definitely solve your issue .
Second suggestion
Not sure it will work with Elastic Search Version 1.5 .
try this query :-
{
"query": {
"constant_score": {
"filter":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
},
"aggs": {
"mywordcloud": {
"terms": {
"field": "post.content_terms"
}
}
}
}
OR
{
"aggregations": {
"bylife": {
"terms": {
"field": "post.content_terms"
},
"aggregations": {
"bylife2": {
"filter": {
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
}
}
}
}
I know this will give different data , but u can change your logic with this approach .

elasticsearch filter aggs by doc count

I have a query that counts the number of images per user:
GET images/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"appID.raw": "myApp"
}
}
]
}
},
"size": 0,
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
}
}
}
}
It basically works fine, but I would like to exclude all aggregation results for users that have less than 200 images. How can I tweak the query above to achieve this?
Thanks.
You can achieve this by using a Minimum Document Count option.
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID",
"min_doc_count": 200
}
}
}
Add a filter aggregation to your terms aggregation with the query clause.
Filter Aggregations
You can modify your above query to look like this.
{
"query": {
"bool": {
"must": [
{
"term": {
"appID.raw": "myApp"
}
}
]
}
},
"size": 0,
"aggs": {
"filtered_users_with_images_count": {
"filter": {
"term": {
"count": 200
}
},
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
}
}
}
}
}
}
You can modify the filter inside filtered_users_with_images_count to match documents with images greater than 200.
Please also consider to post your data mappings along with query to support your questions.

Resources