Elastic search runs groovy scripts two times, is it a bug? - elasticsearch

I found some unexpected behaviour with script query (script is executing two times in a simple query).
My configuration: elastic search version: 2.4.6 (issue remains in elastic 5.6)
My elasticsearch.yml:
script.indexed: true
The steps to reproduce the issue:
1) I have one simple document, doc1.json:
{
"id": "1",
"tags": "t1"
}
2) Insert doc1 in Elastic:
http PUT localhost:9200/default/type1/1 #doc1.json
3) I have one simple groovy script, script1.json (just returns the score and print it):
{
"script": "println('Score is ' + _score * 1.0 + ' for document ' + doc['id'] + ' at ' + DateTime.now().getMillis()); return _score;"
}
4) Register script1:
http POST 'localhost:9200/_scripts/groovy/script1' #script1.json
5) Execute this query_with_script.json:
{
"query":{
"function_score":{
"query":{
"bool":{
"must":{
"match":{
"tags":{
"query":"t1",
"type":"boolean"
}
}
}
}
},
"functions":[
{
"script_score":{
"script":{
"id":"script1",
"lang":"groovy"
}
}
}
],
"boost_mode":"replace"
}
},
"explain" : true
}
http GET 'localhost:9200/default/type1/_search' #query_with_script.json
6) Why in Elastic search logs I see that the script is executed in two different times? Is it a bug?
Score is 0.19178301095962524 for document [1] at 1516586818596
Score is 0.19178301095962524 for document [1] at 1516586818606
Thanks a lot!

You should probably remove the explain flag as it might be the reason why the script gets executed twice.

Related

How to prevent "Too many dynamic script compilations within" error with search templates?

I use a search template with "mustache" language to build dynamic queries according to different parameters.
When I often modify the values ​​of the parameters of this request, I get this error message :
[script] Too many dynamic script compilations within, max: [150/5m];
I think that each time the values ​​of the parameters change, the script is recompiled but if the values ​​are identical then elasticsearch uses a cache so as not to recompile the script.
In our case, the cache cannot be used because at each request the values ​​are always different (local timestamp, variable distance, random seed generated by a client...)
To prevent this error, I change the cluster settings to increase the max_compilations_rate value at the cost of higher server load.
Is there a way to limit recompilation ?
My "big" script computes score according to many parameters and uses Elasticsearch 8.2.
The structure of the script is as follows :
{
"script": {
"lang": "mustache",
"source": "...",
"params": { ... }
}
}
The source code looks like this :
{
"runtime_mappings": {
"is_opened": {
"type": "long",
"script": {
"source": " ... "
}
}
{{#user_location}}
,"distance": {
"type": "long",
"script": {
"source": " ... "
}
}
{{/user_location}}
},
"query": {
"script_score": {
"query": { ... }
},
"script": {
"source": " ... "
}
}
},
"fields": [
"is_opened"
{{#user_location}},"distance"{{/user_location}}
],
...
}
I use mustache variables (with double brackets) everywhere in the script :
in the computed fields ("is_opened", "distance")
in query and filters
in script score
Is there a way to "optimize" internal scripts (computed fields and score script) so as not to restart compilation each time the values for the parameters change ?
To avoid compilations, I need to use "params" inside the embedded runtime fields scripts and inside the query score script.
I had indeed used the parameters for the main script written in "mustache" but I had not done so for the embedded scripts written in "painless".
Thanks #Val for giving me a hint.

How do I do an Anti Match Pattern on Keyword Field Elasticsearch Query 6.4.2

The problem:
Our log data has 27-34 million entries for a /event-heartbeat.
I need to filter those entries out to see just viable log messages in Kibana.
Using Kibana filters with wildcards does not work. Thus, I think I will have to write QueryDSL to do it in version 6.4.2 Elasticsearch to get it to filter out the event heart beats.
I have been looking and I can't find any good explanations on how to do an anti-pattern match so to search for all entries that don't have /event-heartbeat in the message.
Here is the log message:
#timestamp:
June 14th 2019, 12:39:09.225
host.name:
iislogs-production
source:
C:\inetpub\logs\LogFiles\W3SVC5\u_ex19061412.log
offset:
83,944,181
message:
2019-06-14 19:39:06 0.0.0.0 GET /event-heartbeat id=Budrug2UDw 443 - 0.0.0.0 - - 200 0 0 31
prospector.type:
log
input.type:
log
beat.name:
iislogs-production
beat.hostname:
MYHOSTNAME
beat.version:
6.4.2
_id:
yg6AV2sB0_n
_type:
doc
_index:
iislogs-production-6.4.2-2019.06.14
_score:
-
Message is a keyword field so I can do painless scripting on it.
I've used Lucene syntax
NOT message: "*/event-heartbeat*"
This is the anti pattern the kibana filter generates.
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "message": "*event-heartbeat*"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}
I've tried the proposed solution below by huglap. I also adjusted my query based on his comment and tried two ways. I adjust it with the term word instead of match and tried both ways because the field technically is a keyword so I could do painless scripting on it. The query still returns event heartbeat log entries.
Here are the two queries I tried from the below proposed solution:
GET /iislogs-production-*/_search
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"term":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
GET /iislogs-production-*/_search
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"match":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
Index Mapping:
https://gist.github.com/zukeru/907a9b2fa2f0d6f91a532b0865131988
Have you thought about a 'must_not' bool query?
Since your going for the whole set and not really caring about shaping the relevancy function, I suggest the use of a filter instead of a query. You'll get better performance.
{
"query":{
"bool":{
"must":{
"match_all":{
}
},
"filter":{
"bool":{
"must_not":[
{
"match":{
"message.whitespace":"event-heartbeat"
}
}
]
}
}
}
}
}
This example assumes you are querying against a text field, thus the use of a 'match' query instead of a 'term' one.
You also need to make sure that the field is analyzed (really tokenized) according to your goals. The fact that you have a dash in your query term will create problems if you're using a simple or even a standard analyser. Elasticsearch would break the term in two words. You could try the whitespace analyser on that one or just remove the dash from the query.

Optional terms in match_phrase elasticsearch

I am using elasticsearch 6 and have the following query
{
"query":{
"bool":{
"should":[
{
"match_phrase":{
"fieldOne":{
"query":"One two three",
"slop":10
}
}
},
{
"match_phrase":{
"fieldTwo":{
"query":"one two three",
"slop":10
}
}
}
]
}
}
}
This works well when I want to match on the two fields with the terms in the query.
However if I have a document which has term 'one' and 'two' in fieldOne the above does not return results as 'three' is required
I cannot seems to find a way of making the terms in the query optional e.g. what I wanted is to say find any of the terms in those two fields
The reason I went with match_phrase is the use of the slop which allows the terms to be in different positions in the field which i also require
if the order is not important to use, you don't need to use match_phrase, a simple match query does the job
{
"match":{
"fieldOne":{
"query":"one two three"
}
}
},
Then if you need at least two terms to match you can do so using minimum_should_match:
{
"match":{
"fieldOne":{
"query":"one two three",
"minimum_should_match": 2
}
}
},

What is the Elasticsearch equivalent of a negated 'LIKE' or CONTAINS' statement?

I would like to do the Elasticsearch equivalent of the following SQL statement:
SELECT * FROM Users WHERE UserName NOT LIKE '%something%'
I don't care about efficiency or scoring... This only gets executed on occasion. I am using request body syntax.
Use a normal wildcard query, and negate it using a bool must_not query.
GET Users/_search
{
"query": {
"bool": {
"must_not": [{
"wildcard": {
"UserName": {
"value": "*something*"
}
}
}]
}
}
}
I'm not sure if your Users are your index in you Elastic - but this is the main idea anyway:
You could go with the regexp query by using Complement - "~" for negation:
GET Users/_search
{
"query": {
"regexp":{
"UserName": {
"value": ".*~(something).*"
}
}
}
}
For more useful reference, you can check here
P.S:
You will not get the best performance but it will do the job

Facet postfiletring in Solr (translating from ElasticSearch aggregation postfiltering)

Let's say I have a structure like:
{"account_number":171,"balance":7091,
"firstname":"Nelda","lastname":"Hopper",
"age":39,"gender":"M",
"address":"742 Prospect Place","employer":"Equicom",
"email":"neldahopper#equicom.com",
"city":"Finderne","state":"SC"}
(the data comes from here).
If I write the following query in ElasticSearch:
POST /bank/_search?pretty
{
"query":
{ "bool":
{ "must":
[ { "range":
{ "balance": { "gte": 30000 } } } ] }
},
"fields":["gender", "balance", "age"],
"aggs":{
"age_filter":{
"filter":{
"match":{
"age":"30"
}
},
"aggs":{
"gender_stats":{
"terms":{"field":"gender"}
}
}
}
}
}
I'll get (1) 402 query results for the main query and (2) aggregation on the 18 results that passed the filter "age:30".
I've tried to do the similar trick in Solr 5.1, but the closes I could get was this:
q=balance:[30000%20TO%20*]&facet=true&facet.field=gender&fq=age:30
with the big difference that the filter is now applied to the main query results, so I get only 18 results at all, and then apply a corresponding faceting.
Is there a way to write a Solr query that is entirely equivalent to the ElasticSearch one? I.e. getting full results and then applying filtering only to the aggregation/faceting?
NB: I've tried exclusion by tag:
q={!ex=tagForAge}balance:[30000%20TO%20*]&facet=true&facet.field=gender&fq={!tag="tagForAge"}age:30
but it does not seem to apply to the main query.
Try appending &facet.query=age:30 to your query.
This will basically generate your facets from a particular search query which in your case is age:30.
For more information check here.

Resources