Why does Elasticsearch score these documents the way it does? - elasticsearch

I have a query where I'm trying pull documents out of my index and sort them by a date. Additionally, if the document's ID matches a provided one then I boost that result.
When I run my query I'm noticing that some of the documents with a more recent sort date are not at the top of the results because Elasticsearch is giving them a different score than other documents. As a result my result order is incorrect. I don't see anything in my query that could be affecting the score. Anyone have any idea what's happening?
Here's the query I'm using:
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"language.keyword": {
"query": "english",
"operator": "OR",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"functions": [
{
"filter": {
"match": {
"id": {
"query": "ID1",
"operator": "OR",
"boost": 1
}
}
},
"weight": 10
}
],
"score_mode": "multiply",
"boost_mode": "multiply",
"boost": 1
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"sortDate": {
"order": "desc"
}
}
]
}

Related

Elastic Search query with should returning 10.000 results but nothing matches

So I have an index of about 60GB data and basically I want to make a query to retrieve 1 specific product based off its reference number.
here is my query:
GET myindex/_search
{
"_source": [
"product.ref",
"product.urls.*",
"product.i18ns.*.title",
"product_sale_elements.quantity",
"product_sale_elements.prices.*.price",
"product_sale_elements.listen_price.*",
"product.images.image_url",
"product.image_count",
"product.images.visible",
"product.images.position"
],
"size": "6",
"from": "0",
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "product.sales_count",
"missing": 0,
"modifier": "log1p"
}
},
{
"field_value_factor": {
"field": "product.image_count",
"missing": 0,
"modifier": "log1p"
}
},
{
"field_value_factor": {
"field": "featureCount",
"missing": 0,
"modifier": "log1p"
}
}
],
"query": {
"bool": {
"filter": [
{
"term": {
"product.is_visible": true
}
}
],
"should": [
{
"query_string": {
"default_field": "product.ref",
"query": "13141000",
"boost": 2
}
}
]
}
}
}
},
"aggs": {
"by_categories": {
"terms": {
"field": "categories.i18ns.de_DE.title.raw",
"size": 100
}
}
}
}
My question therefore is, why does this query give me back 10k results whereas I just wanted the 1 single product with that reference number.
If I do:
GET my-index/_search
{
"query": {
"match": {
"product.ref": "13141000"
}
}
}
it matches correctly. How is should different then a normal match query?
If you have must or filter clauses, as you do, then anything than matches must or filter does not have to match your should clause, since it's considered "optional"
You can either move query_string within your should clause to filter or set minimum_should_match to 1 like this
...
"should": [
{
"query_string": {
"default_field": "product.ref",
"query": "13141000",
"boost": 2
}
}
],
"minimum_should_match" : 1,
...
Must - The condition must match.
Should - If the condition matches, then it will improve the score in a non-filter context. (If minimum_should_match is not declared explicitly)
As you can see, must is similar to filter but also provides scoring. Filter will not be providing any scoring.
You can put this clause inside a new must clause:
{
"query_string": {
"default_field": "product.ref",
"query": "13141000",
"boost": 2
}
}
Boost will not effect scoring if you put the above inside the filter clause.
Read more about bool queries here

how to add filters to elastic query when using function_score?

Here is my current elastic query:
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "ocean",
"fields": [],
"fuzziness": "AUTO"
}}],
"must_not": [{
"exists": {
"field": "parentId"
}
}]
}
},
"functions" : [
{
"gauss": {
"createTime": {
"origin": "2020-07-09T23:50:00",
"scale": "365d",
"decay": 0.3
}
}
}
]
}
}
}
How do I properly add filters to this? I think maybe the fact that I'm using function_score makes this different? I would like to add a hard filter, for example, only show me results with uploadUser: 'Mr. Bean' ... but still keep the scoring in place for the results that pass this filter.
I tried using filter in various places, also using must but I either get no results or all the results.
I'm using Elastic Search 7. Thanks for your help
You can try this below search query:
Refer this ES official documentation to know more about Function score query
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"term": {
"uploadUser": "Mr. Bean"
}
},
"must": [
{
"multi_match": {
"query": "ocean",
"fields": [
],
"fuzziness": "AUTO"
}
}
],
"must_not": [
{
"exists": {
"field": "parentId"
}
}
]
}
},
"functions": [
{
"gauss": {
"createTime": {
"origin": "2020-07-09T23:50:00",
"scale": "365d",
"decay": 0.3
}
}
}
]
}
}
}

Elasticsearch Ranking on aggregation based on AND and OR

I have a query with multiple keywords with an aggregation on author ID.
I want the ranking to be based on combining must and should.
For example for query 'X', 'Y' the authors containing both 'X' and 'Y' in the document field should be ranked higher, followed by authors who have either 'X' or 'Y'.
Doing each of them (AND/OR) is easy, I need the idea/direction how to achieve both in one ES query.
The current query I have for both X and Y is:
GET /docs/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "X",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
},
{
"query_string": {
"query": "Y",
"fields": [
"fulltext"
],
"default_operator": "AND"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}
Changing must to should change it to OR but I want the combination of both ranking authors with must a higher ranking in aggregation.
The usual way of boosting results is by adding a should clause looking for both terms, like this.
GET /docs/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{
"match": {
"fulltext": "X Y",
"operator": "AND"
}
}
],
"must": [
{
"match": {
"fulltext": "X Y",
"operator": "OR"
}
}
]
}
},
"aggs": {
"search-users": {
"terms": {
"field": "author.id.keyword",
"size": 200
},
"aggs": {
"top-docs": {
"top_hits": {
"size": 100
}
}
}
}
}
}

Elasticsearch - how to know if a particular match condition was hit

Hello elastic experts!
I am new to elasticsearch. I am trying to build a search query with multiple or matching. I am boosting the query for different matching conditions. But here I need a bit more information. I need to know which conditions contributed to the search result. Is there any way to know which match conditions were hit by the query string?
{
"query": {
"bool": {
"should": [
{
"term": {
"title.keyword": {
"value": "Ski trip",
"boost": 1
}
}
},
{
"match_phrase_prefix": {
"title": {
"query": "Ski trip",
"boost": 0.8
}
}
},
{
"match": {
"title": {
"query": "Ski trip",
"operator": "and",
"boost": 0.6
}
}
},
{
"match": {
"description": {
"query": "Ski trip",
"boost": 0.3
}
}
}
]
}
}
}

elasticsearch: Add weight for each match of array

I want to add a weight for each match (instead of adding a weight once if one of those matched):
Having docs like this:
[{
"username": "xyz",
"categories": [
{
"category.id": 1
},
{
"category.id": 2
}
]
}, {
"username": "xyz2",
"categories": [
{
"category.id": 1
}
]
}]
And currently, I have this query:
{
"query": {
"filtered": {
"query": {
"function_score": {
"query": {
"bool": {}
},
"score_mode": "sum",
"boost_mode": "sum",
"functions": [
{
"weight": 1.1,
"filter": {
"terms": {
"category.id": [
1,
2
]
}
}
}
]
}
},
"filter": {
"bool": {
"must_not": [
{
"terms": {
"_id": [
8
]
}
}
]
}
}
}
},
"from": 0,
"size": 30
}
With this query, both entries would receive a single weight of 1.1, but I want the first entry to get 2 * 1.1 because 2 categories are matched. How could I achieve that?
EDIT: Sorry, I missed to add elastic search version. It's 1.7.2.
This might be a bit cumbersome, since for multiple IDs that query will need to have multiple statements, but I don't think there is any other way. Also, notice that your field referencing is not complete - it should be categories.category.id to be correct. Also, be careful when upgrading with dots in field names. This changed in some releases over time.
{
"query": {
"filtered": {
"query": {
"function_score": {
"query": {
"match_all": {}
},
"score_mode": "sum",
"boost_mode": "sum",
"functions": [
{
"weight": 1.1,
"filter": {
"term": {
"categories.category.id": 1
}
}
},
{
"weight": 1.1,
"filter": {
"term": {
"categories.category.id": 2
}
}
}
]
}
},
"filter": {
"bool": {
"must_not": [
{
"terms": {
"_id": [
8
]
}
}
]
}
}
}
},
"from": 0,
"size": 30
}

Resources