Elasticsearch sub-aggregation with a condition - elasticsearch

I have the database table columns like:
ID | Biz Name | License # | Violations | ...
I need to find out those businesses that have more than 5 violations.
I have the following:
{
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
}
},
"aggs" : {
"selected_bizs" :{
"terms" : {
"field" : "Biz Name.keyword",
"min_doc_count": 5,
"size" :1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
It seems working.
Now I need to find out those businesses that have 5 or more violations(like above), and also have 3 or more license #s.
I am not sure how to further aggregate this.
Thanks!

Let's assume that your License # field is defined just like the Biz Name and has a .keyword mapping.
Now, the statement:
find the businesses that have ... 3 or more license #s
can be rephrased as:
aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.
With that being said, you can use the cardinality aggregation to get distinct License IDs.
Secondly, the mechanism for "aggregating under a condition" is the handy bucket_script aggregation which executes a script to determine whether the currently iterated bucket will be retained in the final aggregation.
Leveraging both of these in tandem would mean:
POST your-index/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
},
"aggs": {
"selected_bizs": {
"terms": {
"field": "Biz Name.keyword",
"min_doc_count": 5,
"size": 1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
},
"unique_license_ids": {
"cardinality": {
"field": "License #.keyword"
}
},
"must_have_min_3_License #s": {
"bucket_selector": {
"buckets_path": {
"unique_license_ids": "unique_license_ids"
},
"script": "params.unique_license_ids >= 3"
}
}
}
}
}
}
and that's all there's to it!

Related

Elasticsearch Ranking on aggregation based on AND and OR

I have a query with multiple keywords with an aggregation on author ID.
I want the ranking to be based on combining must and should.
For example for query 'X', 'Y' the authors containing both 'X' and 'Y' in the document field should be ranked higher, followed by authors who have either 'X' or 'Y'.
Doing each of them (AND/OR) is easy, I need the idea/direction how to achieve both in one ES query.
The current query I have for both X and Y is:
GET /docs/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "X",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
},
{
"query_string": {
"query": "Y",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}
Changing must to should change it to OR but I want the combination of both ranking authors with must a higher ranking in aggregation.
The usual way of boosting results is by adding a should clause looking for both terms, like this.
GET /docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{
"match": {
"fulltext": "X Y",
"operator": "AND"
}
}
],
"must": [
{
"match": {
"fulltext": "X Y",
"operator": "OR"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}

Elasticsearch - adding a separate query for aggregation

Below is the elasticsearch query I am using to get the results and the filter options for the results from the aggregation. The problem is that whenever someone applies a filter, the overall result changes and hence the filter options also changes. I do not want the filter options to changes unless query parameter change. For now I am making two calls:
get all results without aggregation
Get all filters by using aggregation and setting the size parameter to 0
This approach uses 2 api requests and hence doubling the time. Can this be done in one request only ?
First call: All results without aggregation
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1,
"filter": [
{
"match": {
"is_paid": false
}
}
]
}
},
"sort": [],
"from": 0,
"size": 15
}
Second call: getting filters
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "cooking",
"boost": 2,
"slop": 10
}
}
},
{
"match": {
"title": {
"query": "cooking",
"boost": 1
}
}
}
],
"minimum_should_match": 1
}
},
"size": 0,
"aggs": {
"courseCount": {
"terms": {
"field": "provider",
"size": 100
}
},
"paidCount": {
"terms": {
"field": "is_paid",
"size": 3
}
},
"subjectCount": {
"terms": {
"field": "subject",
"size": 30
}
},
"levelCount": {
"terms": {
"field": "level",
"size": 4
}
},
"pacingCount": {
"terms": {
"field": "pacing_type",
"size": 4
}
}
}
}

Elasticsearch aggregation not being applied to filters

Here is my query. I am trying to get all products that are inside "men_fashion" and "men_shoes" category (categories are being used as terms/tags). Then i want to query the whole result set and search for products that have "men boots yellow" in them.
The below query works perfectly fine, but now i am not getting the correct aggregation results. It gives me all the brands where as i am only interested in the brands.
{
"size": 15,
"from": 0,
"query": {
"query_string": {
"query": "men boots yellow"
}
},
"filter": {
"bool": {
"must": [{
"match": {
"active": 1
}
}, {
"match": {
"category": "men_fashion"
}
}, {
"match": {
"category": "men_shoes"
}
}]
}
},
"aggs": {
"brands": {
"terms": {
"size": 100,
"field": "brand"
}
}
}
}
I think this might be due to the filter i have applied, but if this is somehow complicated i am ok with using a simple query that would achieve this without the filters.
You're using a post filter instead of a normal query filter, try like this instead:
{
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "men boots yellow"
}
},
"filter": [
{
"match": {
"active": 1
}
},
{
"match": {
"category": "men_fashion"
}
},
{
"match": {
"category": "men_shoes"
}
}
]
}
},
"aggs": {
"brands": {
"terms": {
"size": 100,
"field": "brand"
}
}
}
}

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Elasticsearch Aggregation Word Count with using Stopwords

I'm using elasticsearch to store my data. I want to count the words in my documents. But I want to see the result without the stopwords. For example; in my current result I see 'and' is my top word. But I want to remove it. Currently I have 3802 stopwords in my stopword.txt. I don't want any of them to be shown in the aggregation result. How can I do that? MY current query;
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-0d/d"
}
}
}
]
}
},
"aggs": {
"words": {
"terms": {
"size" : 0,
"field": "text"
}
}
}
}
The way I want query to work is;
{
"aggs": {
"filtered": {
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "now-0d/d"
}
}
}
]
}
},
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "/work/projects/stop_words.txt"
}
},
"aggs": {
"words": {
"terms": {
"size" : 0,
"field": "text"
}
}
}
}
}
}
By the way, I have my stopwords list in my custom analyzer.But it doesn't work the way I want.

Resources