Elasticsearch scripted query painless exponential function - elasticsearch

With Elasticsearch and painless is there a way to implement an exponential function? I can't seem to find anything. I have something like this.
bdy = {
"from" : 0,
"size" : 10,
"query": {
"function_score": {
"query": {
"bool": {
"must": must_terms
}
},
"script_score": {
"script": {
"lang": "expression",
"source": "doc['release_year'].value"
}
}
}
}
}
I want to add some more complex math in the source field, like this.
"source": "Math.exponential(1/doc['release_year'].value)"
Is that possible? Or is there another scripting language that you can do that in within elasticsearch?
UPDATE
Actually looks like I can use.
"lang": "expression"
"source": "_score/10 + 1/(1+ exp(-(doc['release_year'].value*a)))"
http://lucene.apache.org/core/6_0_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html
If anyone has other options that would be cool.

You can do it in Painless the same way with Math.exp()
"source": "_score/10 + 1/(1+ Math.exp(-(doc['release_year'].value*a)))"
See the full Painless API here: https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-api-reference.html

Related

Is it possible to access a query term in a script field?

I would like to construct an elasticsearch query in which I can search for a term and on-the-fly compute a new field for each found document, which is calculated based on some existing fields as well as the query term. Is this possible?
For example, let's say in my EL query I am searching for documents which have the keyword "amsterdam" in the "text" field.
"filter": [
{
"match_phrase": {
"text": {
"query": "amsterdam"
}
}
}]
Now I would also like to have a script field in my query, which computes some value based on other fields as well as the query.
So far, I have only found how to access the other fields of a document though, using doc['someOtherField'], for example
"script_fields" : {
"new_field" : {
"script" : {
"lang": "painless",
"source": "if (doc['citizens'].value > 10000) {
return "large";
}
return "small";"
}
}
}
How can I integrate the query term, e.g. if I wanted to add to the if statement "if the query term starts with a-e"?
You're on the right track but script_fields are primarily used to post-process your documents' attributes — they won't help you filter any docs because they're run after the query phase.
With that being said, you can use scripts to filter your documents through script queries. Before you do that, though, you should explore alternatives.
In other words, scripts should be used when all other mechanisms and techniques have been exhausted.
Back to your example. I see three possibilities off the top of my head.
Match phrase prefix queries as a group of bool-should subqueries:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase_prefix": {
"text_field": "a"
}
},
{
"match_phrase_prefix": {
"text_field": "b"
}
},
{
"match_phrase_prefix": {
"text_field": "c"
}
},
... till the letter "e"
]
}
}
]
}
}
}
A regexp query:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"text_field": "[a-e].+"
}
}
]
}
}
}
Script queries using .charAt comparisons:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"source": """
char c = doc['text_field.keyword'].value.charAt(0);
return c >= params.gte.charAt(0) && c <= params.lte.charAt(0);
""",
"params": {
"gte": "a",
"lte": "e"
}
}
}
}
]
}
}
}
If you're relatively new to ES and would love to see real-world examples, check out my recently released Elasticsearch Handbook. One chapter is dedicated to scripting and as it turns out, you can achieve a lot with scripts (if of course executed properly).

Filter by a threshold value of script_score in elasticsearch

I'm using cosineSimilarity in elasticsearch for searching documents and the query looks like the following:
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0",
"params": {
"queryVector": list(feat)
}
}
}
}}
The issue here is that I'll be getting all the results despite the similarity score. I want to filter my results based on a threshold filter value.
I tried using bool with following script:
query = {
"query": {
"bool" : {
"must": {
"match_all": {}
},
"filter" : {
"script" : {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0 > 1.4",
"params": {
"queryVector": list(feat)
}
}
}
}
}
}
But this throws an error:
RequestError(400, 'x_content_parse_exception', '[source] query malformed, no start_object after query name')
From Text similarity search with vector fields
Important limitations
The script_score query is designed to wrap a restrictive query, and modify the scores of the documents it returns. However, we’ve provided a match_all query, which means the script will be run over all documents in the index. This is a current limitation of vector similarity in Elasticsearch — vectors can be used for scoring documents, but not in the initial retrieval step. Support for retrieval based on vector similarity is an important area of ongoing work.
EDIT
Adding min_score to the request will filter out based on the calculated score after doing the match_all.
{
"min_score": 1.4,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0",
"params": {
"queryVector": list(feat)
}
}
}
}
}

Fast elasticsearch CASE WHEN THEN ELSE equivalent?

I need to build an exclusive bucketing aggregation in Elasticsearch (ie. the documents are assigned to the FIRST bucket to meet the criterion, not ALL buckets that meet it as the filters might overlap - this is the same behavior as a CASE WHEN THEN ELSE in SQL environments). Currently I am using a Filters Aggregation coupled with a Bool Query/Filter to achieve what I want. The idea is to use the "must" and "must_not" parts of the "Bool Query" where the "must" is my filter and the "must_not" is the collection of all the other filters that have already been used previously. An example would be:
GET _search
{
"query":{"match_all":{}},
"size":0,
"aggs":{
"bin_1": {
"filter": {
"bool": {
"must": { <filter1> },
"must_not": { <empty> }
}
}
},
"bin_2": {
"filter": {
"bool": {
"must": { <filter2> },
"must_not": { <filter1> }
}
}
},
"bin_3": {
"filter": {
"bool": {
"must": { <filter3> },
"must_not": { <filter1>, <filter2> }
}
}
},
"bin_else": {
"filter": {
"bool": {
"must": { <empty> },
"must_not": { <filter1>, <filter2>, <filter3> }
}
}
}
}
}
In a relational approach, the same would be achieved by the CASE WHEN clause like so:
CASE WHEN <filter1> THEN <bin_1>
WHEN <filter2> THEN <bin_2>
WHEN <filter3> THEN <bin_3>
ELSE <bin_else>
END
The problem with this approach is that it gets slower and slower the more buckets I add (in my real case I even have nested buckets). Is there any language support for exclusive bucketing like this in Elastic? Or any other faster approach that would yield the same results?
Thank you!
I think the solution would be to Script fields. It would use the if else logic, so no extra conditions would be used. Just I do not know what kind of filter you are using but it should be possible to implement anything I think. I will write here an equivalent of
SELECT
CASE WHEN <filter1> THEN <bin_1>
WHEN <filter2> THEN <bin_2>
ELSE <bin_else>
END as binning
FROM SOMETHING
Implemented using script fields in painless language. As is described here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html
and painless here:
https://www.elastic.co/guide/en/elasticsearch/painless/5.6/painless-examples.html
GET _search
{
"query" : { "match_all": {} },
"script fields" : {
"binning" : {
"script" : {
"lang": "painless",
"source": "if (<filter>) {return <bin1>;} else if (<filter2>) {return <bin2>;} else {return <bin3>;}"
}
}
}
where the "filter" would be something like: doc['my_field'].value == "value1" where 'my_field' is the field that you use in the filter.

How to reject a document using groovy script scoring in elasticsearch

I have a Groovy Script that calculates a score of a document using a set of conditions. What value should I return such that a document will not show up in the search results? Is there such a value or must I apply a filtering script on the data afterwards?
You can use min_score to filter documents not matching the scoring criteria
Here is a sample to showcase how you can use script_score with min_score
{
"min_score": 0,
"query": {
"function_score": {
"functions": [
{
"script_score": {
"params": {
"cutoff": 3
},
"script": "_score < cutoff ? -1 : 1"
},
"boost_mode": "replace"
}
]
}
}
}

Sum-aggregation script for term frequencies without dynamic scripting

I try to evaluate a web-application for my masterthesis. For this I want to make a user study, where I prepare the data in elasitc found, and send my web application to the testers. As far as I know, elastic found does not allow dynamic scripting for security reasons. I try to refomulate the following dynamic script query:
GET my_index/document/_search
{
"query": {
"match_all":{}
},
"aggs": {
"stadt": {
"sum": {
"script": "_index['textBody']['frankfurt'].tf()"
}
}
}
}
This query sums up all term frequencies in the document field textBody for the term frankfurt.
In order to reformulate the query without dynamic scripting, I've taken a look on groovy scripts without dynamic scripting, but I still get parsing errors.
My approach to this was:
GET my_index/document/_search
{
"query": {
"match_all":{}
},
"aggs": {
"stadt": {
"sum": {
"script": {
"script_id": "termFrequency",
"lang" : "groovy",
"params": {
"term" : "frankfurt"
}
}
}
}
}
}
and the file termFrequency.groovy in the scripts directory:
_index['textBody'][term].tf()
I get the following parsing error:
Parse Failure [Unexpected token START_OBJECT in [stadt].]
This is the correct syntax assuming your file is inside config/scripts directory.
{
"query": {
"match_all": {}
},
"aggs": {
"stadt": {
"sum": {
"script_file": "termFrequency",
"lang": "groovy",
"params": {
"term": "frankfurt"
}
}
}
},
"size": 0
}
Also the term should be variable rather than string so it should be
_index['textBody'][term].tf()
Hope this helps!

Resources