Elasticsearch - boosting specific documents in every search - elasticsearch

I'm very new to Elasticsearch. I'm using it to filtering and also boosting some fields at query time. This is the code part for boosting and filtering:
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"multi_match": {
"type": "best_fields",
"query": "exampleKeyword",
"fields": [
"exampleField1^0",
"exampleField2^50",
"exampleField3^10",
"exampleField4^10",
"exampleField5^5"
],
"boost": 50
}
}]
}
}
],
"filter": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"term": {
"bla": {
"value": ""
}
}
}
]
}
}, {
"term": {
"active": {
"value": "true"
}
}
},
{
"range": {
"closingDate": {
"gte": "201710310000",
"lte": "999912312359"
}
}
},
Now I want to boost some specific documents. I'll give an array of integers for example Field6 and if my search results contain the elements of the array, these documents should get boosted with, I dont know, 100 to my scale.
How can I do this? Finally I dont want to expand the result set. Just want to boost more the desired ids if results contain these ids.

Using function_score you can do something around these lines:
{
"query": {
"bool": {
"must": [
{
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"type": "best_fields",
"query": "bla",
"fields": [
"exampleField1^0",
"exampleField2^50",
"exampleField3^10",
"exampleField4^10",
"exampleField5^5"
],
"boost": 50
}
}
]
}
},
"functions": [
{
"filter": {
"ids": {
"values": [
1,
5
]
}
},
"weight": 10
}
],
"score_mode": "max",
"boost_mode": "multiply"
}
}
],
"filter": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"term": {
"bla": {
"value": ""
}
}
}
]
}
},
{
"term": {
"active": {
"value": "true"
}
}
},
{
"range": {
"closingDate": {
"gte": "201710310000",
"lte": "999912312359"
}
}
}
]
}
}
]
}
}
}

Related

Elasticsearch how to set different value with different scores for the same filed?

I have different type_id in an ES index , and want to give different value type_id different scores to make some type search result rank is higher .
My query is
{
"query":{
"bool":{
"must":[
{"terms":{"type_id":[9,10]}}
],
"should":[
{"match":{ "display_name":{"query":"keyword","boost":10}}},
{"match":{ "description":{"query":"keyword","boost":2}}}
]
}
}
}
I want to make type_id 9 match scores is higher than type_id 10 when display_name and description is same .
Please guide me in this problem.
Thanks.
You can group your queries like below and use boost to give more weightage to certain ids.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"type_id": {
"value": 9,
"boost": 2
}
}
},
{
"term": {
"type_id": {
"value": 10,
"boost": 1
}
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"display_name": {
"query": "keyword",
"boost": 10
}
}
},
{
"match": {
"description": {
"query": "keyword",
"boost": 2
}
}
}
]
}
}
]
}
}
}
Edit: For query in comment , you can use function_score
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"type_id": {
"value": 9
}
}
}
],
"minimum_should_match": 1,
"should": [
{
"match": {
"display_name": {
"query": "keyword"
}
}
},
{
"match": {
"description": {
"query": "keyword"
}
}
}
]
}
},
"boost": "5"
}
},
{
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"type_id": {
"value": 10
}
}
}
],
"minimum_should_match": 1,
"should": [
{
"match": {
"display_name": {
"query": "keyword"
}
}
},
{
"match": {
"description": {
"query": "keyword"
}
}
}
]
}
},
"boost": "4"
}
}
]
}
}
}

Weighted search on one field and a normal search on other field

I am trying to perform a search by matching the search query to either the tag or the name of the doc, I also have a filter on the top, so I do have to use must.
Here is what I have been trying,
{
"query": {
"bool": {
"filter": {
"term": {
"type.primary": "audio"
}
},
"must": [
{
"nested": {
"path": "tags",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"tags.tag": "big"
}
}
]
}
},
"field_value_factor": {
"field": "tags.weight"
},
"boost_mode": "multiply",
"boost": 10
}
}
}
},
{
"bool": {
"must": [
{
"multi_match": {
"query": "big",
"fields": [
"name"
],
"type": "phrase_prefix"
}
}
]
}
}
]
}
}
}
This just results in empty.
If I use should instead of must the query works fine, but it gives me all results with the filter of type.primary: audio.
I am pretty sure there is some other way to search for the name field. Thanks.
You're almost there! In your must, you declare that both tags and name has to hit. Try the following:
GET /_search
{
"query": {
"bool": {
"filter": {
"term": {
"type.primary": "audio"
}
},
"must": [
{
"bool": {
"should": [
{
"nested": {
"path": "tags",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"tags.tag": "big"
}
}
]
}
},
"field_value_factor": {
"field": "tags.weight"
},
"boost_mode": "multiply",
"boost": 10
}
}
}
},
{
"multi_match": {
"query": "big",
"fields": [
"name"
],
"type": "phrase_prefix"
}
}
]
}
}
]
}
}
}

ES Must match filter

I have this fairly simple es query and filter, using ES 2.3.5:
{
"query": {
"multi_match": {
"query": "image",
"fields": [
"ToRecipients"
"From",
"Subject"
]
}
},
"filter": {
"bool": {
"must": [
{
"match": {
"ToRecipients": "johndoe"
}
}
]
}
},
"sort": [
{
"DateTimeSent": {
"order": "desc"
}
}
]
}
For some reason it is not filtering by the ToRecipients field. The results coming back have all kinds of values for the field not just johndoe.
Where have I gone wrong?
Try this query instead:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "image",
"fields": [
"ToRecipients",
"From",
"Subject"
]
}
}
],
"filter": {
"bool": {
"must": [
{
"match": {
"ToRecipients": "johndoe"
}
}
]
}
}
}
},
"sort": [
{
"DateTimeSent": {
"order": "desc"
}
}
]
}

Elasticsearch as a solution for automapping different data

This is a tricky one.
I'm currently working in a travel agency that needs to map its hotels to other agencies hotels. So let's say that we got an hotel like this one:
Code123, Hotel name 123, street 123, postcode132, country123
And we want to map it to other hotel that is:
ACode123, Hotel 123 name, st 123, pc132, country123
Regarding this, I want to ask two questions:
Is elasticsearch a good solution for this case? So far, I've gotten some good results thanks to elasticsearch nice features regarding search but I've also gotten some misleading matches (for instance, when having long addresses that should match with short addresses).
The other one is, if it's a good solution, which approach should I take?
To give you more context, this is what I got so far:
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"must": [
{
"query_string": {
"default_field": "name",
"query": "Holiday~ Inn~ Express~ Tianjin~ ",
"fuzzy_min_sim": 0.9
}
}
]
}
}
],
"should": [
{
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Holiday Inn Express Tianjin",
"boost": 1
}
}
},
{
"query_string": {
"default_field": "country",
"query": "CHINA~ ",
"fuzzy_min_sim": 0.9
}
}
],
"should": [
{
"wildcard": {
"nameTerm": {
"wildcard": "*Holiday* Inn*",
"boost": 1
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "2000m",
"coordinates": {
"lon": 117.1852,
"lat": 39.12841
}
}
}
}
}
]
}
}
],
"should": [
{
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Holiday Inn Express Tianjin",
"boost": 1
}
}
},
{
"query_string": {
"default_field": "country",
"query": "CHINA~ ",
"fuzzy_min_sim": 0.9
}
}
],
"should": [
{
"wildcard": {
"nameTerm": {
"wildcard": "*Holiday* Inn* Express*",
"boost": 1.5
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "1500m",
"coordinates": {
"lon": 117.1852,
"lat": 39.12841
}
}
}
}
}
]
}
}
],
"should": [
{
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Holiday Inn Express Tianjin",
"boost": 1
}
}
},
{
"query_string": {
"default_field": "country",
"query": "CHINA~ ",
"fuzzy_min_sim": 0.9
}
}
],
"should": [
{
"query_string": {
"default_field": "addressNoNumbers",
"query": " ZHONGSHAN ROAD HEBEI DISTRICT",
"fuzzy_min_sim": 0.8
}
},
{
"match": {
"addressNumbers": {
"query": "288",
"boost": 1.5
}
}
},
{
"term": {
"nameTerm": {
"value": "Holiday Inn Express Tianjin",
"boost": 2
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "1000m",
"coordinates": {
"lon": 117.1852,
"lat": 39.12841
}
}
}
}
}
]
}
}
],
"should": [
{
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Holiday Inn Express Tianjin",
"boost": 1
}
}
}
],
"should": [
{
"match": {
"addressNumbers": {
"query": "288",
"boost": 1.5
}
}
},
{
"wildcard": {
"addressTerm": {
"wildcard": "*ZHONGSHAN* ROAD*",
"boost": 1
}
}
},
{
"term": {
"nameTerm": {
"value": "Holiday Inn Express Tianjin",
"boost": 2
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "500m",
"coordinates": {
"lon": 117.1852,
"lat": 39.12841
}
}
}
}
}
]
}
}
],
"should": [
{
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Holiday Inn Express Tianjin",
"boost": 1
}
}
}
],
"should": [
{
"match": {
"addressNumbers": {
"query": "288",
"boost": 1.5
}
}
},
{
"wildcard": {
"addressTerm": {
"wildcard": "*ZHONGSHAN* ROAD*",
"boost": 1
}
}
},
{
"term": {
"nameTerm": {
"value": "Holiday Inn Express Tianjin",
"boost": 2
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "300m",
"coordinates": {
"lon": 117.1852,
"lat": 39.12841
}
}
}
}
}
]
}
}
],
"should": [
{
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "Holiday Inn Express Tianjin",
"boost": 1
}
}
}
],
"should": [
{
"match": {
"addressNumbers": {
"query": "288",
"boost": 1.5
}
}
},
{
"wildcard": {
"addressTerm": {
"wildcard": "*ZHONGSHAN* ROAD*",
"boost": 1
}
}
},
{
"term": {
"nameTerm": {
"value": "Holiday Inn Express Tianjin",
"boost": 2
}
}
}
]
}
},
"filter": {
"geo_distance": {
"distance": "100m",
"coordinates": {
"lon": 117.1852,
"lat": 39.12841
}
}
}
}
}
]
}
}
}
So lots of nesting, that work quite well when I'm getting all the fields but not that much when I'm missing coordinate.
But anyway, my main concern is if I should go with elasticsearch or not (and which could be the alternative!)
Thanks in advance!

random_score does not work properly with "should"

In the following code I always get "Alexander McQueen" products coming first, no matter what I set the seed to.
How can I change my search query to properly shuffle results?
{
"query": {
"function_score": {
"random_score": {
"seed": 99287
},
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"query_string": {
"query": "(adidas originals)",
"default_operator": "AND",
"fields": [
"name^4",
"description"
]
}
},
{
"terms": {
"category": [
"Fashion",
"Sports",
"Other",
""
]
}
},
{
"term": {
"currency": {
"term": "USD"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"query_string": {
"query": "(alexander mcqueen)",
"default_operator": "AND",
"fields": [
"name^4",
"description"
]
}
},
{
"terms": {
"category": [
"Fashion"
]
}
},
{
"term": {
"currency": {
"term": "USD"
}
}
}
]
}
}
]
}
}
}
},
"size": 40,
"from": 0
}
That's because the random score is being multiplied by the _score from the original query. If you want the results to be purely based on the random score, then set the boost_mode to replace (instead of the default multiply).
See the function_score documentation.

Resources