Elasticsearch: how to search on computed fields - elasticsearch

Using the data from the current version of Elasticsearch: The Definitive Guide, that is:
[{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}, {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}, {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}]
I'm trying to run a simple computation (I've enabled Groovy scripting) using the request data:
{
"query": {
"filtered": {
"filter": {
"range": {
"max_age": {
"gt": 150
}
}
}
}
},
"script_fields": {
"max_age": {
"script": "_source.age * 5"
}
}
}
But ES isn't returning any data. How can I search over computed fields? It's even better if I don't need to enable scripting.

script_fields are computed after the query phase, i.e. during the fetch phase, so you cannot reference script fields inside your queries.
What you need to achieve can still be done, though, by using a script filter, like this:
{
"query": {
"bool": {
"must": {
"script": {
"script": {
"inline": "doc['age'].value * factor > max_age",
"params": {
"factor": 5,
"max_age": 150
}
}
}
}
}
}
}

Related

How to sort a query in Elasticsearch based on another field?

In Elasticsearch, I'm storing data as below:
{
"name" : "John Doe",
"country" : "India"
},
{
"name" : "John Doe",
"country" : "USA"
}
If user enter "John" than I'm searching by name field by querying as:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "name:(John*)"
}
}
]
}
}
}
And getting result something like from above query:
{
"name" : "John Doe",
"country" : "India"
},
{
"name" : "John William",
"country" : "USA"
},
{
"name" : "John Smith",
"country" : "Canada"
},
{
"name" : "John Lobo",
"country" : "India"
}
I'm getting all result with name John but I want John from country: India at top, after that Canada and than all other countries. That means I have to give more weightage to data with country=India, than country=Canada respectively.
Data with country=India and Canada should have more score in search results.
I found something boost by value feature in elasticsearch but not able to make query as required. Only one field value I can specify here.
{
"query": {
"boosting": {
"positive": {
"term": {
"country.keyword": "India"
}
},
"negative": {
"match_all": {}
},
"negative_boost": 0.5
}
}
}
According to https://www.elastic.co/guide/en/elasticsearch/reference/7.17/sort-search-results.html
you can apply script based sorting, the example given is
{
"query": {
"term": { "user": "kimchy" }
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "doc['field_name'].value * params.factor",
"params": {
"factor": 1.1
}
},
"order": "asc"
}
}
}
As a result, you could pass the script you prefer, logically it could be an ascending sort on values of the like of
India -> AAAAA
Canada -> AAAAB
otherwise you use the country name

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Elasticsearch Sort by: element presence in the array and date

I want to build a query on v5.6 where I find all books that:
belong to a user or are marked as favourite by that user
and then sort them by the fact that they are his favourite first and date desc.
Book record would look like this
{
title: "book1"
users_favourite: ["1", "2"],
userid: "1",
date: timestamp
}
I have most of the query but I don't know how to put favourite books on top and sorted by date:
{
"from" : 0,
"size" : 51,
"query" : {
"bool" : {
"should" : [
{
"match" : {
"users_favourite" : {
"query" : "1",
"operator" : "OR"
}
}
},
{
"term" : {
"userid" : {
"value" : "1"
}
}
}
]
}
},
"sort" : [
{
"date" : {
"order" : "desc",
"missing" : "_last",
"unmapped_type" : "string"
}
}
]
}
So the output would be: all users favourite books first, sorted by date
followed by books owned by that user also ordered by date.
user's favourite
date
true
2021
true
2020
true
2018
false
2021
false
2019
false
2017
The general solution I would use is something like this
Regarding sorting:
GET social_unique_flat_verified_guesses/_search
{
"from": 0,
"size": 51,
"query": {
"bool": {
"should": [
{
"match": {
"users_favourite": {
"query": "1",
"operator": "OR"
}
}
},
{
"term": {
"userid": {
"value": "1"
}
}
}
]
}
},
"sort": [
{
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "doc['users_favourite'].contains(params.users_favourite) ? 1 : 0",
"params": {
"users_favourite": "1"
}
},
"order": "desc"
}
},
{
"date": {
"order": "desc",
"missing": "_last",
"unmapped_type": "string"
}
}
]
}
This tells elastic to first score all matched (1 or 0) if the book is in the user's favorite, then by year.
Also I would change the should clause for the user favorite to this
"term": {
"users_favourite": {
"value": "1"
}
}
Keep in mind that if the "users_favourite" field can be empty, or if the query can search for multiple user favorites the query needs to be a bit different. So let me know and I'll edit it

Update the score of Pinned Documents in Elastic Search

I have a requirement to show some documents always on top of search results and for that, I have used pinned query to pin some documents and the pinned documents will have a score value of 1.7014122E38.
But I have another requirement to modify this score of pinned documents which I'm unable to achieve at the query level.
Sample Documents
"docs": [
{
"_id": 1,
"name": "jack"
},
{
"_id": 2,
"name": "ryan"
},
{
"_id": 3,
"name": "mark"
},
{
"_id": 4,
"name": "taylor"
},
{
"_id": 5,
"name": "taylor"
}
]
}
ES Query
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [
"3"
],
"organic": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"name": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
}
]
}
}
}
Now I want to multiply the pinned document score weight with some value which I'm unable to achieve in ES.
Can someone please help me to solve this requirement?
Since the pinned queries' scores are calculated at query time, there's no way of knowing what they're will end up being. It could be 1.7014122E38 but also 1.7014122402528844E38 etc.
What you could do is use a sort script and check whether the implicit score is unusually high (I chose Integer.MAXV_VALUE as the boundary) which'd indicate whether or not you're dealing with a pinned. If that's the case, you can override the pinned documents' scores however you like.
POST your-index/_search?track_scores&filter_path=hits.hits._id,hits.hits._source,hits.hits.sort
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [ "3" ],
"organic": {
"bool": {
"must": [
{
"multi_match": {
"query": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
]
}
},
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": {
"source": "_score >= Integer.MAX_VALUE ? params.score_rewrite : _score",
"params": {
"score_rewrite": 42
}
}
}
}
]
}
Note that it's necessary to set the track_scores URI parameter because when sorting on a field, the scores are not computed by default.
That way, the resulting hits would look along the lines of:
{
"hits" : {
"hits" : [
{
"_id" : "3", <-- pinned ID
"_source" : {
"name" : "mark"
},
"sort" : [
42.0 <-- overridden sort
]
},
{
"_id" : "4",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926 <-- default sort
]
},
{
"_id" : "5",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926
]
}
]
}
}
P.S.: Integer.MAX_VALUE is arbitrary and there's absolutely no guarantee that it'll catch all pinned docs. In other words, a bit of experimentation will be needed to choose a bulletproof boundary.

How to boost certain documents if the search query contains a certain term/text in elastic

If the search query contains fruits I want to boost the products from a certain category?
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 2,
"filter": {
"term": { "categories": "3" }
}
}
}
]
}
}
}
I have the above query, which gives a constant score to items with the category 3, I want to apply this constant score/boosting/increase relevancy only when a certain text (for example fruits) is present in the search term.
Sample elasticsearch document
{
"id" : 1231,
"name" : {
"ar" : "Arabic fruit name",
"en" : "english fruit name"
}
"categories" : [3,1,3] // category ids because the same product is shown in multiple categories
}
How do I achieve this? I use elasticsearch 7.2
Original answer:
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"boost": 2,
"filter": {
"bool": {
"filter": [
{
"term": {
"categories": "3"
}
}
],
"should": [
{
"match": {
"name.ar": "fruit"
}
},
{
"match": {
"name.en": "fruit"
}
}
],
"minimum_should_match": 1
}
}
}
}
]
}
}
}
If i understand correctly what you're looking for.
Btw, I suggest using "match_phrase" instead of "match" if you want to match "fruit name" exactly and not "fruit" or "name"
Update: (based on the comment)
In that case i'd suggest reorganizing your schema in the following manner:
"item": {
"properties": {
"name": {
"type": ["string"]
},
"language": {
"type": ["string"]
}
}
}
So your sample would become:
{
"id" : 1231,
"item" : [
{"name": "Arabic fruit name", "language": "ar"}
{"name": "english fruit name", "language": "en"}
],
"categories" : [3,1,3]
}
And then you can match against "item.name"
Why? Because the way ElasticSearch indexes (at least, by default) is to flatten your the array, so internally it looks like ["Arabic fruit name", "english fruit name"]
With your original sample, two different fields are created in the schema (name.ar and name.en), which is actually not a great design if you need to scale

Resources