Update the score of Pinned Documents in Elastic Search - elasticsearch

I have a requirement to show some documents always on top of search results and for that, I have used pinned query to pin some documents and the pinned documents will have a score value of 1.7014122E38.
But I have another requirement to modify this score of pinned documents which I'm unable to achieve at the query level.
Sample Documents
"docs": [
{
"_id": 1,
"name": "jack"
},
{
"_id": 2,
"name": "ryan"
},
{
"_id": 3,
"name": "mark"
},
{
"_id": 4,
"name": "taylor"
},
{
"_id": 5,
"name": "taylor"
}
]
}
ES Query
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [
"3"
],
"organic": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"name": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
}
]
}
}
}
Now I want to multiply the pinned document score weight with some value which I'm unable to achieve in ES.
Can someone please help me to solve this requirement?

Since the pinned queries' scores are calculated at query time, there's no way of knowing what they're will end up being. It could be 1.7014122E38 but also 1.7014122402528844E38 etc.
What you could do is use a sort script and check whether the implicit score is unusually high (I chose Integer.MAXV_VALUE as the boundary) which'd indicate whether or not you're dealing with a pinned. If that's the case, you can override the pinned documents' scores however you like.
POST your-index/_search?track_scores&filter_path=hits.hits._id,hits.hits._source,hits.hits.sort
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [ "3" ],
"organic": {
"bool": {
"must": [
{
"multi_match": {
"query": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
]
}
},
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": {
"source": "_score >= Integer.MAX_VALUE ? params.score_rewrite : _score",
"params": {
"score_rewrite": 42
}
}
}
}
]
}
Note that it's necessary to set the track_scores URI parameter because when sorting on a field, the scores are not computed by default.
That way, the resulting hits would look along the lines of:
{
"hits" : {
"hits" : [
{
"_id" : "3", <-- pinned ID
"_source" : {
"name" : "mark"
},
"sort" : [
42.0 <-- overridden sort
]
},
{
"_id" : "4",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926 <-- default sort
]
},
{
"_id" : "5",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926
]
}
]
}
}
P.S.: Integer.MAX_VALUE is arbitrary and there's absolutely no guarantee that it'll catch all pinned docs. In other words, a bit of experimentation will be needed to choose a bulletproof boundary.

Related

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Nested Query Elastic Search

Currently I am trying to search/filter a nested Document in Elastic Search Spring Data.
The Current Document Structure is:
{
"id": 1,
"customername": "Cust#123",
"policydetails": {
"address": {
"city": "Irvine",
"state": "CA",
"address2": "23994384, Out OF World",
"post_code": "92617"
},
"policy_data": [
{
"id": 1,
"status": true,
"issue": "Variation Issue"
},
{
"id": 32,
"status": false,
"issue": "NoiseIssue"
}
]
}
}
Now we need to filter out the policy_data which has Noise Issue and If there is no Policy Data which has Noise Issue the policy_data will be null inside the parent document.
I have tried to use this Query
{
"query": {
"bool": {
"must": [
{
"match": {
"customername": "Cust#345"
}
},
{
"nested": {
"path": "policiesDetails.policy_data",
"query": {
"bool": {
"must": {
"terms": {
"policiesDetails.policy_data.issue": [
"Noise Issue"
]
}
}
}
}
}
}
]
}
}
}
This works Fine to filter nested Document. But If the Nested Document does not has the match it removes the entire document from the view.
What i want is if nested filter does not match:-
{
"id": 1,
"customername": "Cust#123",
"policydetails": {
"address": {
"city": "Irvine",
"state": "CA",
"address2": "23994384, Out OF World",
"post_code": "92617"
},
"policy_data": null
}
If any nested document is not found then parent document will not be returned.
You can use should clause for policy_data. If nested document is found it will be returned under inner_hits otherwise parent document will be returned
{
"query": {
"bool": {
"must": [
{
"match": {
"customername": "Cust#345"
}
}
],
"should": [
{
"nested": {
"path": "policydetails.policy_data",
"inner_hits": {}, --> to return matched policy_data
"query": {
"bool": {
"must": {
"terms": {
"policydetails.policy_data.issue": [
"Noise Issue"
]
}
}
}
}
}
}
]
}
},
"_source": ["id","customername","policydetails.address"] --> selected fields
}
Result:
{
"_index" : "index116",
"_type" : "_doc",
"_id" : "f1SxGHoB5tcHqHDtAkTC",
"_score" : 0.2876821,
"_source" : {
"policydetails" : {
"address" : {
"city" : "Irvine",
"address2" : "23994384, Out OF World",
"post_code" : "92617",
"state" : "CA"
}
},
"id" : 1,
"customername" : "Cust#123"
},
"inner_hits" : {
"policydetails.policy_data" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ] --> nested query result , matched document returned
}
}
}
}

Elasticsearch multi-match fields doesn't include query string

I am doing a multi-match search using the following query object:
{
_source: [
'baseline',
'cdrp',
'date',
'description',
'dev_status',
'element',
'event',
'id'
],
track_total_hits: true,
query: {
bool: {
filter: [{name: "baseline", values: ["1f.0.1.0", "1f.1.8.3"]}],
should: [
{
multi_match:{
query: "national",
fields: ["cdrp","description","narrative.*","title","cop"]
}
}
]
}
},
highlight: { fields: { '*': {} } },
sort: [],
from: 0,
size: 50
}
I'm expecting the word "national" to be found within description or narrative.* fields but only one record out of 2 returned meet my expectations. I'm trying to understand why.
elasticsearch.config.ts
"settings": {
"analysis": {
"analyzer": {
"search_synonyms": {
"tokenizer": "whitespace",
"filter": [
"graph_synonyms",
"lowercase",
"asciifolding"
],
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "search_synonyms"
},
"narrative": {
"type":"object",
"properties":{
"_all":{
"type": "text",
"analyzer": "search_synonyms"
}
}
},
}
}
Should clause works like OR, it doesn't filter out documents it affects scoring. Documents which match should clause are scored higher.
If you want to filter on multi-match you can move it inside filter clause
filter: [
{
name: "baseline", values: ["1f.0.1.0", "1f.1.8.3"]
},
{
multi_match:
{
query: "national",
fields: ["cdrp","description","narrative.*","title","cop"]
}
}
]
Filter vs Must:- Both return documents matching clauses specified. Filter doesn't score documents. So if you are not interested in score of documents or are not concerned with order of documents returned, you can use filter. So both are same with difference of scoring.
Documents with more matches are scored higher
Multi_match by default uses best_fields
Finds documents which match any field, but uses the _score from the
best field.
It uses score returned for field with maximum number of matches to calculate score for each document.
Example
Document 1 has matches in two field , field1 (score 2), field2 (score 1)
Document 2 has matches in one field , field2 (score 3)
Documnet 2 will be ranked higher even if 1 field has matched.
You can change it to most_fields
Finds documents which match any field and combines the _score from
each field.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "test",
"fields": [],
"type": "most_fields"
}
}
]
}
}
}
Still a document with fewer number of fields matched can be ranked higher due to high score in a field caused by multiple terms.
If you want to give same score to a single field irrespective of number of tokens matched. You need to use constant_score query
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"field1": "test"
}
}
}
},
{
"constant_score": {
"filter": {
"term": {
"field2": "test"
}
}
}
}
]
}
},
"highlight": {
"fields": {
"field1": {},
"field2": {}
}
}
}
Result:
"hits" : [
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iSCe6nEB8J88APx3YBGn",
"_score" : 2.0, --> one score per field matched
"_source" : {
"field1" : "test",
"field2" : "test"
},
"highlight" : {
"field1" : [
"<em>test</em>"
],
"field2" : [
"<em>test</em>"
]
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iiCe6nEB8J88APx3ghF-",
"_score" : 1.0,
"_source" : {
"field1" : "test",
"field2" : "abc"
},
"highlight" : {
"field1" : [
"<em>test</em>"
]
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iyCf6nEB8J88APx3UhF8",
"_score" : 1.0,
"_source" : {
"field1" : "test do",
"field2" : "abc"
},
"highlight" : {
"field1" : [
"<em>test</em> do"
]
}
}
]
}

Elasticsearch match one array with another array

Let's say I have two indexes kids and outings_for_kids with the following data
kids
[
{
"name": "little kid 1",
"i_like":["drawing","teddybears"]
},
]
outings for kids
[
{
"name": "Teddybear drawing fights with apples!",
"for_kids_that_like":["apples","teddybears","drawing", "play outside games"]
},
{
"name": "drawing and teddies!",
"for_kids_that_like":["teddybears","drawing"]
}
]
I want to find an outing that likes the same things little kid 1 likes and a lower score if it has more.
Little kid 1 should not match 100% with the first outing. It has what little kid 1 wants, but but it has more e.g. apples, it should match 50%.
It should match 100% with the second outing.
This will be a 2 step process:
Get i_like value from fields index
Use i_like from step 1 to query outings index
Use terms query to match each value
Use script to compare array size with number of values
Use constant score to give same score based on index count
Query
GET outings/_search
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"for_kids_that_like": {
"value": "teddybears"
}
}
},
{
"term": {
"for_kids_that_like": {
"value": "drawing"
}
}
},
{
"script": {
"script": "doc['for_kids_that_like.keyword'].size()==2" --> replace 2 with size of elements searched
}
}
]
}
},
"boost": 100
}
},
{
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"for_kids_that_like": {
"value": "teddybears"
}
}
},
{
"term": {
"for_kids_that_like": {
"value": "drawing"
}
}
},
{
"script": {
"script": "doc['for_kids_that_like.keyword'].size()>2"
}
}
]
}
},
"boost": 50
}
}
]
}
}
}
Result:
"hits" : [
{
"_index" : "outings",
"_type" : "_doc",
"_id" : "IH7tVHEBbLcSRUWr6wPj",
"_score" : 100.0,
"_source" : {
"name" : "Teddybear drawing fights with apples!",
"for_kids_that_like" : [
"teddybears",
"drawing"
]
}
},
{
"_index" : "outings",
"_type" : "_doc",
"_id" : "IX7zVHEBbLcSRUWrhgM9",
"_score" : 50.0,
"_source" : {
"name" : "Teddybear drawing fights with apples!",
"for_kids_that_like" : [
"teddybears",
"drawing",
"apples"
]
}
}
]
If you just want to show exact match documents on top followed by partial matches then you don't need constant score(must query with term search will work). By default exact matches are given higher score

Elasticsearch: how to search on computed fields

Using the data from the current version of Elasticsearch: The Definitive Guide, that is:
[{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}, {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}, {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}]
I'm trying to run a simple computation (I've enabled Groovy scripting) using the request data:
{
"query": {
"filtered": {
"filter": {
"range": {
"max_age": {
"gt": 150
}
}
}
}
},
"script_fields": {
"max_age": {
"script": "_source.age * 5"
}
}
}
But ES isn't returning any data. How can I search over computed fields? It's even better if I don't need to enable scripting.
script_fields are computed after the query phase, i.e. during the fetch phase, so you cannot reference script fields inside your queries.
What you need to achieve can still be done, though, by using a script filter, like this:
{
"query": {
"bool": {
"must": {
"script": {
"script": {
"inline": "doc['age'].value * factor > max_age",
"params": {
"factor": 5,
"max_age": 150
}
}
}
}
}
}
}

Resources