Elasticsearch Ranking on aggregation based on AND and OR - elasticsearch

I have a query with multiple keywords with an aggregation on author ID.
I want the ranking to be based on combining must and should.
For example for query 'X', 'Y' the authors containing both 'X' and 'Y' in the document field should be ranked higher, followed by authors who have either 'X' or 'Y'.
Doing each of them (AND/OR) is easy, I need the idea/direction how to achieve both in one ES query.
The current query I have for both X and Y is:
GET /docs/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "X",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
},
{
"query_string": {
"query": "Y",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}
Changing must to should change it to OR but I want the combination of both ranking authors with must a higher ranking in aggregation.

The usual way of boosting results is by adding a should clause looking for both terms, like this.
GET /docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{
"match": {
"fulltext": "X Y",
"operator": "AND"
}
}
],
"must": [
{
"match": {
"fulltext": "X Y",
"operator": "OR"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}

Related

Why am I getting different results for the same exact query when I aggregate it in elasticsearch?

So I have this query and I am trying to aggregate a certain field but when I use the same query in aggregation I dont get expected results
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "TEST",
"fields": ["TESTFIELD1", "TESTFIELD2"],
"lenient": true,
"default_operator": "OR"
}
}
]
}
},
"aggs": {
"All": {
"global": {},
"aggs": {
"TESTAGG": {
"filter": {
"bool": {
"must": [
{
"query_string": {
"query": "TEST",
"fields": ["TESTFIELD1", "TESTFIELD2"],
"lenient": true,
"default_operator": "OR"
}
}
]
}
},
"aggs": {
"subs": {
"terms": {
"field": "TESTFIELD1",
"size": 100,
"order": { "_term": "asc" }
}
}
}
}
}
}
}
}
The issue is in the aggregation is that I get values for TESTFIELD1 that dont exist in the hits in the main query and I am not sure why. Any ideas?

Elasticsearch sub-aggregation with a condition

I have the database table columns like:
ID | Biz Name | License # | Violations | ...
I need to find out those businesses that have more than 5 violations.
I have the following:
{
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
}
},
"aggs" : {
"selected_bizs" :{
"terms" : {
"field" : "Biz Name.keyword",
"min_doc_count": 5,
"size" :1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
It seems working.
Now I need to find out those businesses that have 5 or more violations(like above), and also have 3 or more license #s.
I am not sure how to further aggregate this.
Thanks!
Let's assume that your License # field is defined just like the Biz Name and has a .keyword mapping.
Now, the statement:
find the businesses that have ... 3 or more license #s
can be rephrased as:
aggregate by the business name under the condition that the number of distinct values of the bucketed license IDs is greater or equal to 3.
With that being said, you can use the cardinality aggregation to get distinct License IDs.
Secondly, the mechanism for "aggregating under a condition" is the handy bucket_script aggregation which executes a script to determine whether the currently iterated bucket will be retained in the final aggregation.
Leveraging both of these in tandem would mean:
POST your-index/_search
{
"size": 0,
"query": {
"bool": {
"must": {
"match": {
"violations": {
"query": "MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
},
"must_not": {
"match": {
"violations": {
"query": "NO MICE DROPPINGS were OBSERVED",
"operator": "and"
}
}
}
}
},
"aggs": {
"selected_bizs": {
"terms": {
"field": "Biz Name.keyword",
"min_doc_count": 5,
"size": 1000
},
"aggs": {
"top_biz_hits": {
"top_hits": {
"size": 10
}
},
"unique_license_ids": {
"cardinality": {
"field": "License #.keyword"
}
},
"must_have_min_3_License #s": {
"bucket_selector": {
"buckets_path": {
"unique_license_ids": "unique_license_ids"
},
"script": "params.unique_license_ids >= 3"
}
}
}
}
}
}
and that's all there's to it!

How to convert ElasticSearch query to ES7

We are having a tremendous amount of trouble converting an old ElasticSearch query to a newer version of ElasticSearch. The original query for ES 1.8 is:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"and": [
{
"terms": {
"organization_id": [
"fred"
]
}
}
]
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 0,
"field": "status"
}
},
"tags": {
"terms": {
"size": 0,
"field": "tags"
}
}
}
}
and we are trying to convert it to ES version 7. Does anyone know how to do that?
The Elasicsearch docs for Filtered query in 6.8 (the latest version of the docs I can find that has the page) state that you should move the query and filter to the must and filter parameters in the bool query.
Also, the terms aggregation no longer support setting size to 0 to get Integer.MAX_VALUE. If you really want all the terms, you need to set it to the max value (2147483647) explicitly. However, the documentation for Size recommends using the Composite aggregation instead and paginate.
Below is the closest query I could make to the original that will work with Elasticsearch 7.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "*",
"default_operator": "AND"
}
},
"filter": {
"terms": {
"organization_id": [
"fred"
]
}
}
}
},
"size": 50,
"sort": {
"updated": "desc"
},
"aggs": {
"status": {
"terms": {
"size": 2147483647,
"field": "status"
}
},
"tags": {
"terms": {
"size": 2147483647,
"field": "tags"
}
}
}
}

Elasticsearch aggregation not being applied to filters

Here is my query. I am trying to get all products that are inside "men_fashion" and "men_shoes" category (categories are being used as terms/tags). Then i want to query the whole result set and search for products that have "men boots yellow" in them.
The below query works perfectly fine, but now i am not getting the correct aggregation results. It gives me all the brands where as i am only interested in the brands.
{
"size": 15,
"from": 0,
"query": {
"query_string": {
"query": "men boots yellow"
}
},
"filter": {
"bool": {
"must": [{
"match": {
"active": 1
}
}, {
"match": {
"category": "men_fashion"
}
}, {
"match": {
"category": "men_shoes"
}
}]
}
},
"aggs": {
"brands": {
"terms": {
"size": 100,
"field": "brand"
}
}
}
}
I think this might be due to the filter i have applied, but if this is somehow complicated i am ok with using a simple query that would achieve this without the filters.
You're using a post filter instead of a normal query filter, try like this instead:
{
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"query_string": {
"query": "men boots yellow"
}
},
"filter": [
{
"match": {
"active": 1
}
},
{
"match": {
"category": "men_fashion"
}
},
{
"match": {
"category": "men_shoes"
}
}
]
}
},
"aggs": {
"brands": {
"terms": {
"size": 100,
"field": "brand"
}
}
}
}

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Resources