Elasticsearch - give negative boost to documents without a certain field - elasticsearch

I'm working on a query, the basic filtered multi match query is working as planned, it returns the documents i want.
The issue is that a want to boost results which have a certain string field with ex. 0.5, or in this example give results which don't have this field 'traded_as' a negative boost of 1.0.
Cannot get the filter - boost - must - exists/missing to work as i want.
It this the correct approach on this issue?
Using elasticsearch 1.5.2
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "something",
"fields": ["title", "url", "description"]
}
},
"filter": {
"bool": {
"must": {
"missing": {
"field": "marked_for_deletion"
}
}
}
}
}
},
"boosting": {
"positive": {
"filter": {
"bool": {
"must": {
"exists": {
"field": "traded_as"
}
}
}
}
},
"negative": {
"filter": {
"bool": {
"must": {
"missing": {
"field": "traded_as"
}
}
}
}
},
"negative_boost": 1.0
}
}

You cannot have the desired result. As stated in the doc for boosting query :
Unlike the "NOT" clause in bool query, this still selects documents that contain undesirable terms, but reduces their overall score.
{
"query": {
"boosting": {
"positive": [{
"filtered": {
"query": {
"multi_match": {
"query": "something",
"fields": ["title", "url", "description"]
}
},
"filter": {
"bool": {
"must": [{
"missing": {
"field": "marked_for_deletion"
}
}]
}
}
}
}],
"negative": [{
"filtered": {
"filter": {
"missing": {
"field": "traded_as"
}
}
}
}],
"negative_boost": 1.0
}
}
}
So you'll still have some irrelevant documents, but matching documents will have a better score. You won't have any boost on traded_as presence that way. For this you should have a look at function score http://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_using_function_score
You would have something like
{
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "something",
"fields": ["title", "url", "description"]
}
},
"filter": {
"bool": {
"must": {
"missing": {
"field": "marked_for_deletion"
}
}
}
}
}
},
"functions": [{
"filter": {
"exists": {
"field": "traded_as"
}
},
"boost_factor": 2
}, {
"filter": {
"missing": {
"field": "traded_as"
}
},
"boost_factor": 0.5
}],
"score_mode": "first",
"boost_mode": "multiply"
}
}
}

Related

Weighted search on one field and a normal search on other field

I am trying to perform a search by matching the search query to either the tag or the name of the doc, I also have a filter on the top, so I do have to use must.
Here is what I have been trying,
{
"query": {
"bool": {
"filter": {
"term": {
"type.primary": "audio"
}
},
"must": [
{
"nested": {
"path": "tags",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"tags.tag": "big"
}
}
]
}
},
"field_value_factor": {
"field": "tags.weight"
},
"boost_mode": "multiply",
"boost": 10
}
}
}
},
{
"bool": {
"must": [
{
"multi_match": {
"query": "big",
"fields": [
"name"
],
"type": "phrase_prefix"
}
}
]
}
}
]
}
}
}
This just results in empty.
If I use should instead of must the query works fine, but it gives me all results with the filter of type.primary: audio.
I am pretty sure there is some other way to search for the name field. Thanks.
You're almost there! In your must, you declare that both tags and name has to hit. Try the following:
GET /_search
{
"query": {
"bool": {
"filter": {
"term": {
"type.primary": "audio"
}
},
"must": [
{
"bool": {
"should": [
{
"nested": {
"path": "tags",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"tags.tag": "big"
}
}
]
}
},
"field_value_factor": {
"field": "tags.weight"
},
"boost_mode": "multiply",
"boost": 10
}
}
}
},
{
"multi_match": {
"query": "big",
"fields": [
"name"
],
"type": "phrase_prefix"
}
}
]
}
}
]
}
}
}

Elasticsearch filter exists and function_score

I'm trying to apply a function_score query, along with a filter that returns only those results where a certain field is not null (the exists query), in Elasticsearch 5.4.1.
The scoring functions work as expected, but I'm unsure how to apply the exists query in context of function_score.
Example:
{
"query": {
"function_score": {
"query": {
"filter": {
"bool": {
"must": {
"exists": {
"field": "name_of_field"
}
}
}
},
"dis_max": {
"queries": [{
"multi_match": {
"query": "term",
"fields": ["name^3", "other_names"]
}
}, {
"match_phrase": {
"known_as": "term"
}
}]
}
},
"functions": [{
"filter": {
"match_phrase_prefix": {
"known_as": "term"
}
},
"weight": 200
}, {
"filter": {
"multi_match": {
"query": "term",
"fields": ["name^3", "other_names"],
"operator": "and"
}
},
"weight": 10
}]
}
}
}

Elasticsearch 2.3 has_child with must_not.exists query

I am trying to construct a query in which children of a parent are filtered based on a country code in an array. If the child has no countries field I still want to return the result.
I have two working queries:
{
"query": {
"bool": {
"should": {
"has_child": {
"inner_hits": {},
"type": "service",
"score_mode": "sum",
"query": {
"bool": {
"filter": [
{
"term": {
"countries": "AF"
}
}
]
}
}
}
}
}
}
}
duly returns an array of results where 'AF' is in the countries array and:
{
"query":
{
"bool": {
"should": {
"has_child": {
"inner_hits": {},
"type": "service",
"score_mode": "sum",
"query": {
"bool": {
"must_not": {
"exists": {
"field": "countries"
}
}
}
}
}
}
}
}
}
returns the results I want where the child has no countries field.
What I can't figure out is how to combine those two queries to get one combined set of results. That is to say I want to 'OR' the two sets.
Haven't actually tested this, it's a blind suggestion:
{
"query": {
"bool": {
"should": {
"has_child": {
"inner_hits": {},
"type": "service",
"score_mode": "sum",
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "countries"
}
},
{
"term": {
"countries": "AF"
}
}
]
}
},
{
"missing": {
"field": "countries"
}
}
]
}
},
{
"term": {
"whatever": "blabla"
}
}
]
}
}
}
}
}
}
}

Rescoring and Sorting of documents

My goal is to write a query which would rescore documents based on value of a field in the document. To achieve this I was using a rescore query and then sorting the results. However, an explain on the query shows me that the sorting of the documents is done based on the previously computed score and not the new one.
I saw the following which explains that I couldn't use rescore and sort together.
"Sometimes we want to show results, where the ordering of the first documents on the page is affected by the additional rules. Unfortunately this cannot be achieved by the rescore functionality. The first idea points to window_size parameter, but this parameter in fact is not connected with the first documents on the result list but with number of results returned on every shard. In addition window_size cannot be less than page size. (If it is less, ElasticSearch silently use page size). Also, one very important thing – rescoring cannot be combined with sorting, because sorting is done after changes introduced by rescoring."
http://elasticsearchserverbook.com/elasticsearch-0-90-using-rescore/
My query is:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"constant_score": {
"query": {
"match": {
"question": {
"query": "diabetes"
}
}
},
"boost": 1
}
},
{
"dis_max": {
"queries": [
{
"constant_score": {
"query": {
"match": {
"question": {
"query": "diabetes"
}
}
},
"boost": 0.01
}
},
{
"constant_score": {
"query": {
"match": {
"answer_text": {
"query": "diabetes"
}
}
},
"boost": 0.0001
}
}
]
}
},
{
"dis_max": {
"queries": [
{
"constant_score": {
"query": {
"match_phrase": {
"question_phrase": {
"query": "what is diabetes",
"slop": 0
}
}
},
"boost": 100
}
},
{
"constant_score": {
"query": {
"match_phrase": {
"question_phrase": {
"query": "what is diabetes",
"slop": 1
}
}
},
"boost": 50
}
},
{
"constant_score": {
"query": {
"match_phrase": {
"question_phrase": {
"query": "what is diabetes",
"slop": 2
}
}
},
"boost": 33
}
},
{
"constant_score": {
"query": {
"match_phrase": {
"question_phrase": {
"query": "what is diabetes",
"slop": 3
}
}
},
"boost": 25
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "question_group_four",
"query": "what__is__diabetes"
}
},
"boost": 0.1
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "question_group_five",
"query": "what__is__diabetes"
}
},
"boost": 0.15
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_no_synonyms_20",
"query": "what__is__diabetes"
}
},
"boost": 35
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_no_synonyms_15",
"query": "what__is__diabetes"
}
},
"boost": 25
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_no_synonyms_10",
"query": "what__is__diabetes"
}
},
"boost": 15
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_20",
"query": "what__is__diabetes"
}
},
"boost": 28
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_15",
"query": "what__is__diabetes"
}
},
"boost": 16
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_10",
"query": "what__is__diabetes"
}
},
"boost": 13
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_05",
"query": "what__is__diabetes"
}
},
"boost": 4
}
}
]
}
},
{
"dis_max": {
"queries": [
{
"constant_score": {
"query": {
"query_string": {
"default_field": "question_group_four",
"query": "diabetes"
}
},
"boost": 0.1
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "question_group_five",
"query": "diabetes"
}
},
"boost": 0.15
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_no_synonyms_20",
"query": "diabetes"
}
},
"boost": 35
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_no_synonyms_15",
"query": "diabetes"
}
},
"boost": 25
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_no_synonyms_10",
"query": "diabetes"
}
},
"boost": 15
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_20",
"query": "diabetes"
}
},
"boost": 28
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_15",
"query": "diabetes"
}
},
"boost": 16
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_10",
"query": "diabetes"
}
},
"boost": 13
}
},
{
"constant_score": {
"query": {
"query_string": {
"default_field": "concept_words_05",
"query": "diabetes"
}
},
"boost": 4
}
}
]
}
}
],
"disable_coord": true
}
},
"filter": {
"and": [
{
"term": {
"posted_by_expert": false
}
},
{
"term": {
"tip_question": false
}
},
{
"term": {
"show_in_work_queue": true
}
},
{
"range": {
"verified_answers_count": {
"gt": 0
}
}
}
]
}
}
},
"rescore": {
"window_size": 100,
"query": {
"rescore_query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "_score * _source.concierge_boost"
}
}
]
}
}
}
},
"sort": [
"_score",
{
"count_words_with_high_concepts": {
"order": "asc"
}
},
{
"popularity": {
"order": "desc"
}
},
{
"length": {
"order": "asc"
}
}
],
"fields": [],
"size": 10,
"from": 0
}
Any help highly appreciated !
This is not possible, indeed. But this has been discussed and decided is not worth implementing it at the moment. The discussion on github, though, reveals the difficulty about this - documents need to be sorted, top 100 (in your case) chosen, then a rescore is applied and then they are, again, sorted. I suggest reading the comments in that github issue, especially the ones from simonw. The issue is still open but it doesn't seem it will be implemented soon, if it will at all.
Regarding your sorting after another level of scoring, I understand the need to rescore only few documents, but it seems is not possible. What if you wrap your query in another function_score where you define a script_score function to compute the final score? Something like this:
{
"query": {
"function_score": {
"query": {
.......
},
"functions": [
{
"script_score": {
"script": "doc['concierge_boost'].value"
}
}
]
}
},
"sort": [
"_score",
{
"count_words_with_high_concepts": {
"order": "asc"
}
},
{
"popularity": {
"order": "desc"
}
},
{
"length": {
"order": "asc"
}
}
],
"fields": [],
"size": 10,
"from": 0
}

Using Elasticsearch, how do I do a filtered query on both my document properties and nested document properties?

Here is an example document source.
{
"tags": [
"meow",
"cats",
"feline"
],
"visible": 1,
"for_sale": "y",
"title": "Cat Meow",
"stock": [{
"department": "mens",
"size": "small"
}, {
"department": "mens",
"size": "medium"
}]
}
I want to find documents that are 'stock.department=mens' and 'stock.size=medium' and also are 'for_sale=y'
Here is the query that I've come up with so far. I can't figure out how to filter by for_sale=y.
{
"size": 5,
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": ["title", "tags"],
"query": "cat"
}
},
"filter": {
"nested": {
"path": "stock",
"filter": {
"bool": {
"must": [{
"term": {
"stock.size": "medium"
}
}, {
"term": {
"stock.department": "mens"
}
}]
}
}
}
}
}
}
}
This is what I've come up with. If anyone has any critiques or improvements please share them.
{
"size": 5,
"query": {
"filtered": {
"query": {
"multi_match": {
"fields": ["title", "tags"],
"query": "cat"
}
},
"filter": {
"bool": {
"must": [{
"term": {
"for_sale": "y"
}
}, {
"term": {
"visible": 1
}
}, {
"nested": {
"path": "stock",
"filter": {
"bool": {
"must": [{
"term": {
"stock.size": "medium"
}
}, {
"term": {
"stock.department": "mens"
}
}]
}
}
}
}]
}
}
}
}
}

Resources