I am working on a Elasticsearch project. I want to get an additional column in response when an index is queried. Say for example, if I have an index with two columns num1 and num2, when this index is queried it should respond with two column (num1 and num2) but also with additional column add_result (which is actually a addition of two columns). If I query it normally like below it would respond with just two columns
{
query:{
match_all : {}
}
}
In my use case I have tried:
{
"runtime_mappings": {
"add_result": {
"type": "double",
"script": "emit(doc['file_count'].value + doc['follower_count'].value)"
}
},
"query": {
"match_all": {}
}
}
Yes, there are 2 ways:
1. Using runtime field
This feature is available since Elasticsearch 7.12. Simply make a GET request to the _search endpoint with the request body like this:
{
"runtime_mappings": {
"add_result": {
"type": "double",
"script": "emit(doc['num1'].value + doc['num2'].value)"
}
},
"fields": ["add_result", "num*"],
"query": {
"match_all": {}
}
}
You need to explicitly specify that you want to get your runtime fields back in the fields parameter.
2. Using script_field
The request looks like this:
{
"query": {
"match_all": {}
},
"fields": ["num*"],
"script_fields": {
"add_result": {
"script": {
"lang": "painless",
"source": "doc['num1'].value + doc['num2'].value"
}
}
}
}
Note that you still need to have the fields parameter, but you don't need to include your script_field (add_result in this case) in the fields parameter.
Related
My mapping has two properties:
"news_from_date" : {
"type" : "string"
},
"news_to_date" : {
"type" : "string"
},
Search results have the properties news_from_date, news_to_date
curl -X GET 'http://172.2.0.5:9200/test_idx1/_search?pretty=true' 2>&1
Result:
{
"news_from_date" : "2022-05-30 00:00:00",
"news_to_date" : "2022-06-23 00:00:00"
}
Question is: How can I boost all results with the current date being in between their "news_from_date"-"news_to_date" interval, so they are shown as highest ranking results?
Tldr;
First off if you are going to play with dates, you should probably use the one of the dates type provided by Elasticsearch.
They are many way to approach you problem, using painless, using scoring function or even more classical query types.
Using Should
Using the Boolean query type, you have multiple clauses.
Must
Filter
Must_not
Should
Should allow for optionals clause to be factored in the final score.
So you go with:
GET _search
{
"query": {
"bool": {
"should": [
{
"range": {
"news_from_date": {
"gte": "now"
}
}
},
{
"range": {
"news_to_date": {
"lte": "now"
}
}
}
]
}
}
}
Be aware that:
You can use the minimum_should_match parameter to specify the number or percentage of should clauses returned documents must match.
If the bool query includes at least one should clause and no must or filter clauses, the default value is 1. Otherwise, the default value is 0.
Using a script
As provided by the documentation, you can create a custom function to score your documents according to your own business rules.
The script is using Painless (a stripped down version of java)
GET /_search
{
"query": {
"function_score": {
"query": {
"match": { "message": "elasticsearch" }
},
"script_score": {
"script": {
"source": "Math.log(2 + doc['my-int'].value)"
}
}
}
}
}
Say I create an index people which will take entries that will have two properties: name and friends
PUT /people
{
"mappings": {
"properties": {
"friends": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and I put two entries, each one of them has two friends.
POST /people/_doc
{
"name": "Jack",
"friends": [
"Jill", "John"
]
}
POST /people/_doc
{
"name": "Max",
"friends": [
"John", "John" # Max will have two friends, but both named John
]
}
Now I want to search for people that have multiple friends
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['friends.keyword'].length > 1"
}
}
}
]
}
}
}
This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.
Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "ctx._source.friends.length > 1"
}
}
}
]
}
}
}
But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.
One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.
Thanks a lot.
There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:
Add fields to existing documents without reindexing your data
Start working with your data without understanding how it’s structured
Override the value returned from an indexed field at query time
Define fields for a specific use without modifying the underlying schema
you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:
Index Time:
PUT my-index/
{
"mappings": {
"runtime": {
"friends_count": {
"type": "keyword",
"script": {
"source": "doc['#friends'].size()"
}
}
},
"properties": {
"#timestamp": {"type": "date"}
}
}
}
You can also use runtime fields in search time for more information check here.
Search Time
GET my-index/_search
{
"runtime_mappings": {
"friends_count": {
"type": "keyword",
"script": {
"source": "ctx._source.friends.size()"
}
}
}
}
Update:
POST mytest/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.arrayLength = ctx._source.friends.size()"
}
}
You can update all of your document with query above and adjust your query.
For everyone wondering about the same issue, I think #Kaveh answer is the most likely way to go, but I did not manage to make it work in my case. It seems to me that source is created after the query is performed and therefore you cannot access source for the purposes of filtering query.
This leaves you with two options:
filter the result on the application level (ugly and slow solution)
actually save the filed length in a separate field. Such as friends_count
possibly there is another option I don't know about(?).
I'm using cosineSimilarity in elasticsearch for searching documents and the query looks like the following:
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0",
"params": {
"queryVector": list(feat)
}
}
}
}}
The issue here is that I'll be getting all the results despite the similarity score. I want to filter my results based on a threshold filter value.
I tried using bool with following script:
query = {
"query": {
"bool" : {
"must": {
"match_all": {}
},
"filter" : {
"script" : {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0 > 1.4",
"params": {
"queryVector": list(feat)
}
}
}
}
}
}
But this throws an error:
RequestError(400, 'x_content_parse_exception', '[source] query malformed, no start_object after query name')
From Text similarity search with vector fields
Important limitations
The script_score query is designed to wrap a restrictive query, and modify the scores of the documents it returns. However, we’ve provided a match_all query, which means the script will be run over all documents in the index. This is a current limitation of vector similarity in Elasticsearch — vectors can be used for scoring documents, but not in the initial retrieval step. Support for retrieval based on vector similarity is an important area of ongoing work.
EDIT
Adding min_score to the request will filter out based on the calculated score after doing the match_all.
{
"min_score": 1.4,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0",
"params": {
"queryVector": list(feat)
}
}
}
}
}
I am trying to figure out how to solve these two problems that I have with my ES 5.6 index.
"mappings": {
"my_test": {
"properties": {
"Employee": {
"type": "nested",
"properties": {
"Name": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"Surname": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
I need to create two separate scripted filters:
1 - Filter documents where size of employee array is == 3
2 - Filter documents where the first element of the array has "Name" == "John"
I was trying to make some first steps, but I am unable to iterate over the list. I always have a null pointer exception error.
{
"bool": {
"must": {
"nested": {
"path": "Employee",
"query": {
"bool": {
"filter": [
{
"script": {
"script" : """
int array_length = 0;
for(int i = 0; i < params._source['Employee'].length; i++)
{
array_length +=1;
}
if(array_length == 3)
{
return true
} else
{
return false
}
"""
}
}
]
}
}
}
}
}
}
As Val noticed, you cant access _source of documents in script queries in recent versions of Elasticsearch.
But elasticsearch allow you to access this _source in the "score context".
So a possible workaround ( but you need to be careful about the performance ) is to use a scripted score combined with a min_score in your query.
You can find an example of this behavior in this stack overflow post Query documents by sum of nested field values in elasticsearch .
In your case a query like this can do the job :
POST <your_index>/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
if (params["_source"]["Employee"].length === params.nbEmployee) {
def firstEmployee = params._source["Employee"].get(0);
if (firstEmployee.Name == params.name) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
""",
"params": {
"nbEmployee": 3,
"name": "John"
}
}
}
}
]
}
}
}
The number of Employee and first name should be set in the params to avoid script recompilation for every use case of this script.
But remember it can be very heavy on your cluster as Val already mentioned. You should narrow the set a document on which your will apply the script by adding filters in the function_score query ( match_all in my example ).
And in any case, it is not the way Elasticsearch should be used and you cant expect bright performances with such a hacked query.
1 - Filter documents where size of employee array is == 3
For the first problem, the best thing to do is to add another root-level field (e.g. NbEmployees) that contains the number of items in the Employee array so that you can use a range query and not a costly script query.
Then, whenever you modify the Employee array, you also update that NbEmployees field accordingly. Much more efficient!
2 - Filter documents where the first element of the array has "Name" == "John"
Regarding this one, you need to know that nested fields are separate (hidden) documents in Lucene, so there is no way to get access to all the nested docs at once in the same query.
If you know you need to check the first employee's name in your queries, just add another root-level field FirstEmployeeName and run your query on that one.
Is there a way in elasticsearch where I can cast a string to a long value at query time?
I have something like this in my document:
"attributes": [
{
"key": "age",
"value": "23"
},
{
"key": "name",
"value": "John"
},
],
I would like to write a query to get all the persons that have an age > 23. For that I need to cast the value to an int such that I can compare it when the key is age.
The above document is an example very specific to this problem.
I would greatly appreciate your help.
Thanks!
You can use scripting for that
POST /index/type/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "foreach(attr : _source['attributes']) {if ( attr['key']=='age') { return attr['value'] > ageValue;} } return false;",
"params" : {
"ageValue" : 23
}
}
},
"query": {
"match_all": {}
}
}
}
}
UPD: Note that dynamic scripting should be enabled in elasticsearch.yml.
Also, I suppose you can archive better query performance by refactoring you document structure and applying appropriate mapping for age field.