Elasticsearch script_score with array - elasticsearch

I am trying to use script_score to update the score based on a json of ID values. The score should multiply the original score by the factor listed in params.
"script_score": {
"params": {
"ranking": {
"1": "1.3403946161270142",
"3": "1.3438195884227753"
}
},
"script": "_score * ranking[doc['ID'].value]"
}
I am getting the following error:
nested: QueryParsingException[[index name] script_score the script could not be loaded]; nested: CompileException[[Error: unbalanced braces [ ... ]]\n[Near : {... _score * ranking[doc['ID'].value] ....}]\n ^\n[Line: 1, Column: 29]]; }]"
If I manually specify an ID for example _score * ranking['1'], it works fine. Also if I use the ID directly it works, but not if I use the ID value as the index. I should note that the ID is an integer. Can anyone help me solve this? Additionally, how would this work if the ID isn't in the ranking list? Would it treat it as score='_score'?

You ranking param is not an array but a map. The type used for the ID field should match the type used for the key in your map, and the boost should be a number, not a string.
Here is a document:
{
"ID" : "1"
}
and here is the updated query:
GET /test/_search
{
"query": {
"function_score": {
"functions": [
{
"script_score": {
"params": {
"ranking": {
"1": 1.3403946161270142,
"3": 1.3438195884227753
}
},
"script": "_score * ranking.get(_doc['ID'].value)"
}
}
]
}
}
}
The current script doesn't handle cases where the entry is not within the parama map, case that leads to a NullPointerException.
That said I think this boosting method will not scale as you need to have an entry per document in your params, which is hardly maintainable. Having the rank within each of the document seems better although you'd need to update them every time you want to change it.

Related

How to compare two date fields in same document in elasticsearch

In my elastic search index, each document will have two date fields createdDate and modifiedDate. I'm trying to add a filter in kibana to fetch the documents where the modifiedDate is greater than createdDate. How to create this filter in kibana?
Tried Using below query instead of greater than it is considering as gte and fetching all records
GET index/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script" : {
"inline" : "doc['modifiedTime'].value.getMillis() > doc['createdTime'].value.getMillis()",
"lang" : "painless"
}
}
}
}
}
}
There are a few options.
Option A: The easiest and most performant one is to store the difference of the two fields inside a new field of your document, e.g.
{
"createDate": "2022-01-11T12:34:56Z",
"modifiedDate": "2022-01-11T12:34:56Z",
"diffMillis": 0
}
{
"createDate": "2022-01-11T12:34:56Z",
"modifiedDate": "2022-01-11T12:35:58",
"diffMillis": 62000
}
Then, in Kibana you can query on diffMillis > 0 and figure out all documents that have been modified after their creation.
Option B: You can use a script query
GET index/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": """
return doc['createdDate'].value.millis < doc['modifiedDate'].value.millis;
"""
}
}
}
}
}
Note: depending on the amount of data you have, this option can potentially have disastrous performance, because it needs to be evaluated on ALL of your documents.
Option C: If you're using ES 7.11+, you can use runtime fields directly from the Kibana Discover view.
You can use the following script in order to add a new runtime field (e.g. name it diffMillis) to your index pattern:
emit(doc['modifiedDate'].value.millis - doc['createdDate'].value.millis)
And then you can add the following query into your search bar
diffMillis > 0

Compare two fields in same document without using script elasticsearch

We are using elastic version 7.10.2. I want to compare two fields from a same document.Scripting is disabled in my organization.
Kindly help in building below query without using script.
Here my query is : nickname is null or nickname is empty or nickname is equal to firstname.
Hard part is how to build query to get the records which have nickname is equal to firstname
Relevant script query to be converted to normal query :
{
"query": {
"bool": {
"must": [{
"script": {
"script": {
"inline": "doc['nickname.keyword'].value==null || doc['nickname.keyword'].value =='' || doc['nickname.keyword'].value == doc['firstname.keyword'].value",
"lang": "painless",
}
}
}]
}
}
}
I see you are already comparing the nickname.keyword to your firstname also mentioned this is the hard part, for this why you need a script, you can simply use the search query on this keyword field and get the result you want.
You can use below term query for it.
{
"query": {
"term": {
"nickname.keyword": {
"value": "your-nickname", // provide your nickname as value
}
}
}
}

Elastic search apply boost based on nested field value

Below is my indexed document
{
"defaultBoostValue":1.01,
"boostDetails": [
{
"Type": "Type1",
"value": 1.0001
},
{
"Type": "Type2",
"value": 1.002
},
{
"Type": "Type3",
"value": 1.0005
}
]
}
i want to apply boost based on value passed, so suppose i pass Type 1 then boost applied will be 1.0001 and if that Type1 does not exist then it will use defaultBoostValue
below is my query which works but quite slow, is there any way to optimize it further
Original question
Above query works but is slow as we are using _source
{
"query": {
"function_score": {
"boost_mode": "multiply",
"functions": [
"script_score": {
"script": {
"source": """
double findBoost(Map params_copy) {
for (def group : params_copy._source.boostDetails) {
if (group['Type'] == params_copy.preferredBoostType ) {
return group['value'];
}
}
return params_copy._source['defaultBoostValue'];
}
return findBoost(params)
""",
"params": {
"preferredBoostType": "Type1"
}
}
}
}
]
}
}
}
I have removed the condition of not having dynamic mapping, if changing the structure of boostDetails mapping can help then I am ok but please explain how it can help and be faster to query also please give mapping types and modified structure if answer contains modifying mapping.
Using dynamic mappings (lots of fields)
It looks like you adjusted the doc structure compared to your original question.
The query above was thought for nested fields which cannot be easily iterated in a script for performance reasons. Having said that, the above is an even slower workaround which accesses the docs' _source and iterates its contents. But keep in mind that it's not recommended to access the _source in scripts!
If your docs aren't nested anymore, you can access the so-called doc values which are much more optimized for query-time access:
{
"query": {
"function_score": {
...
"functions": [
{
...
"script_score": {
"script": {
"lang": "painless",
"source": """
try {
if (doc['boost.boostType.keyword'].value == params.preferredBoostType) {
return doc['boost.boostFactor'].value;
} else {
throw new Exception();
}
} catch(Exception e) {
return doc['fallbackBoostFactor'].value;
}
""",
"params": {
"preferredBoostType": "Type1"
}
}
}
}
]
}
}
}
thus speeding up your function score query.
Alternative using an ordered list of values
Since the nested iteration is slow and dynamic mappings are blowing up your index, you could store your boosts in a standardized ordered list in each document:
"boostValues": [1.0001, 1.002, 1.0005, ..., 1.1]
and keep track of the corresponding boost types' order in the backend where you construct the queries:
var boostTypes = ["Type1", "Type2", "Type3", ..., "TypeN"]
So something like n-hot vectors.
Then, as you construct the Elasticsearch query, you'd look up the array index of the boostValues based on the boostType and pass this array index to the script query from above which'd access the corresponding boostValues doc-value.
This is guaranteed to be faster than _source access. But it's required that you always keep your boostTypes and boostValues in sync -- preferably append-only (as you add new boostTypes, the list grows in one dimension).

How to use multifield search in elasticsearch combining should and must clause

This may be a repeted question but I'm not findin' a good solution.
I'm trying to search elasticsearch in order to get documents that contains:
- "event":"myevent1"
- "event":"myevent2"
- "event":"myevent3"
the documents must not contain all of them in the same document but the result should contain only documents that are only with those types of events.
And this is simple because elasticsearch helps me with the clause should
which returns exactly what i want.
But then, I want that all the documents must contain another condition that is I want the field result.example.example = 200 and this must be in every single document PLUS the document should be 1 of the previously described "event".
So, for example, a document has "event":"myevent1" and result.example.example = 200 another one has "event":"myevent2" and result.example.example = 200 etc etc.
I've tried this configuration:
{
"query": {
"bool": {
"must":{"match":{"operation.result.http_status":200}},
"should": [
{
"match": {
"event": "bank.account.patch"
}
},
{
"match": {
"event": "bank.account.add"
}
},
{
"match": {
"event": "bank.user.patch"
}
}
]
}
}
}
but is not working 'cause I also get documents that not contain 1 of the should field.
Hope I explained well,
Thanks in advance!
As is, your query tells ES to look for documents that must have "operation.result.http_status":200 and to boost those that have a matching event type.
You're looking to combine two must queries
one that matches one of your event types,
one for your other condition
The event clause accepts multiple values and those values are exact matches : you're looking for a terms query.
Try
{
"query": {
"bool": {
"must": [
{"match":{"operation.result.http_status":200}},
{
"terms" : {
"event" : [
"bank.account.patch",
"bank.account.add",
"bank.user.patch"
]
}
}
]
}
}
}

Scope Elasticsearch Results to Specific Ids

I have a question about the Elasticsearch DSL.
I would like to do a full text search, but scope the searchable records to a specific array of database ids.
In SQL world, it would be the functional equivalent of WHERE id IN(1, 2, 3, 4).
I've been researching, but I find the Elasticsearch query DSL documentation a little cryptic and devoid of useful examples. Can anyone point me in the right direction?
Here is an example query which might work for you. This assumes that the _all field is enabled on your index (which is the default). It will do a full text search across all the fields in your index. Additionally, with the added ids filter, the query will exclude any document whose id is not in the given array.
{
"bool": {
"must": {
"match": {
"_all": "your search text"
}
},
"filter": {
"ids": {
"values": ["1","2","3","4"]
}
}
}
}
Hope this helps!
As discussed by Ali Beyad, ids field in the query can do that for you. Just to complement his answer, I am giving an working example. In case anyone in the future needs it.
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field": "your query"
}
},
{
"ids" : {
"values" : ["0aRM6ngBFlDmSSLpu_J4", "0qRM6ngBFlDmSSLpu_J4"]
}
}
]
}
}
}
You can create a bool query that contains an Ids query in a MUST clause:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-ids-query.html
By using a MUST clause in a bool query, your search will be further limited by the Ids you specify. I'm assuming here by Ids you mean the _id value for your documents.
According to es doc, you can
Returns documents based on their IDs.
GET /_search
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
}
}
With elasticaBundle symfony 5.2
$query = new Query();
$IdsQuery = new Query\Ids();
$IdsQuery->setIds($id);
$query->setQuery($IdsQuery);
$this->finder->find($query, $limit);
You have two options.
The ids query:
GET index/_search
{
"query": {
"ids": {
"values": ["1, 2, 3"]
}
}
}
or
The terms query:
GET index/_search
{
"query": {
"terms": {
"yourNonPrimaryIdField": ["1", "2","3"]
}
}
}
The ids query targets the document's internal _id field (= the primary ID). But it often happens that documents contain secondary (and more) IDs which you'd target thru the terms query.
Note that if your secondary IDs contain uppercase chars and you don't set their field's mapping to keyword, they'll be normalized (and lowercased) and the terms query will appear broken because it only works with exact matches. More on this here: Only getting results when elasticsearch is case sensitive

Resources