Elasticsearch script query array field size comparison - elasticsearch

Is it possible to make a comparison with respect to the length of an array field in Elasticsearch?
For instance the following works (sourceId is a text type field)
GET /entity_active/_count
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['sourceId'].values.size() > 0",
"lang": "painless"
}
}
}
}
}
}
However the following does not work (users is an array field)
GET /entity_active/_count
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['users'].values.size() > 0",
"lang": "painless"
}
}
}
}
}
}
The latter returns such a response:
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:80)",
"doc['users'].values.size() > 0",
" ^---- HERE"
],
"script": "doc['users'].values.size() > 0",
"lang": "painless"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
Do I need to use an alternative way to make use of such array fields?

Arrays are not indexed in ES, hence you cannot access an array via doc_values (i.e. through doc.). You can access the _source document in which the array is present, though, using ctx._source. Try this script instead:
_source.users.size() > 0
or
_source.users.length > 0

Related

Elasticsearch Query DSL: Length of field, if field exists

Say I have a field, data.url. Some our logs contain this field, some do not. I want to return only results where data.url is more than, say, 50 characters long. Really I just need a list of URLs.
I'm trying:
GET _search
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['data.url'].value.length() > 50",
"lang": "painless"
}
}
}
}
}
}
But get mixed errors:
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:90)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
"doc['data.url'].value.length() > 50",
" ^---- HERE"
],
"script" : "doc['data.url'].value.length() > 50",
"lang" : "painless",
"position" : {
"offset" : 4,
"start" : 0,
"end" : 35
}
},
or
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.fielddata.ScriptDocValues$Strings.get(ScriptDocValues.java:496)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:503)",
"doc['data.url'].value.length() > 50",
" ^---- HERE"
],
"script" : "doc['data.url'].value.length() > 50",
"lang" : "painless",
"position" : {
"offset" : 15,
"start" : 0,
"end" : 35
}
With
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "No field found for [data.url] in mapping with types []"
}
and sometimes
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "A document doesn't have a value for a field! Use doc[<field>].size()==0 to check if a document is missing a field!"
}
This field definitely exists; I can see it in the logs, search in the search field, and using term works:
GET _search
{
"query": {
"bool": {
"filter": {
"term": {
"data.url": "www.google.com"
}
}
}
}
}
What am I missing?
I'm using Elasticsearch 7.8.
Since you are using version 7.*, you need to use this below script query
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['data.url.keyword'].length > 50",
"lang": "painless"
}
}
}
}
}
}
If data.url field is of keyword type, then ignore the ".keyword" at the end of the field

Add date field and boolean with ? in name to existing Elasticsearch documents

We need to add two new fields to an existing ElasticSearch (7.9 oss) instance.
Field 1: Date Field
We want to add an optional date field. It shouldn't have a value upon creation.
How to do this with update_by_query?
Tried this:
POST orders/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.new_d3_field",
"lang": "painless",
"type": "date",
"format": "yyyy/MM/dd HH:mm:ss"
}
}
Field 2: Boolean field with ? in name
We want to keep the ? so that it matches the other fields that we already have in ES.
Also worth noting that even removing the ? and doing the below the field doesn't appear to be a boolean.
Tried this:
POST orders/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.new_b_field? = false",
"lang": "painless"
}
}
Which gave the error:
{
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "compile error",
"script_stack" : [
"ctx._source.new_b_field? = false",
" ^---- HERE"
],
"script" : "ctx._source.new_b_field? = false",
"lang" : "painless",
"position" : {
"offset" : 25,
"start" : 0,
"end" : 32
}
}
],
"type" : "script_exception",
"reason" : "compile error",
"script_stack" : [
"ctx._source.new_b_field? = false",
" ^---- HERE"
],
"script" : "ctx._source.new_b_field? = false",
"lang" : "painless",
"position" : {
"offset" : 25,
"start" : 0,
"end" : 32
},
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "invalid sequence of tokens near ['='].",
"caused_by" : {
"type" : "no_viable_alt_exception",
"reason" : null
}
}
},
"status" : 400
}
Also tried:
POST orders/_update_by_query?new_b_field%3F=false
Which gave:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "request [/orders/_update_by_query] contains unrecognized parameter: [new_b_field?]"
}
],
"type" : "illegal_argument_exception",
"reason" : "request [/orders/_update_by_query] contains unrecognized parameter: [new_b_field?]"
},
"status" : 400
}
If you want to add two new fields to an existing ElasticSearch index that don't have value upon creation you should update its mapping using Put mapping API
PUT /orders/_mapping
{
"properties": {
"new_d3_field": {
"type": "date",
"format": "yyyy/MM/dd HH:mm:ss"
},
"new_b_field?": {
"type": "boolean"
}
}
}
If you still want to use _update_by_query you should set an initial value, then the field will be added.
POST orders/_update_by_query?wait_for_completion=false&conflicts=proceed
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.new_d3_field=params.date;ctx._source.new_b_field = params.val",
"lang": "painless",
"params": {
"date": "1980/01/01",
"val": false
}
}
}
Update By Query API is used to update documents so I guess you can't add a field to your schema without updating at list one doc. what you can do is to set a dummy doc and update only this certain doc. Something like that:
POST orders/_update_by_query
{
"query": {
"match": {
"my-field":"my-value"
}
},
"script": {
"source": "ctx._source.new_d3_field=params.date;ctx._source.new_b_field = params.val",
"lang": "painless",
"params": {
"date": "1980/01/01",
"val": false
}
}
}

For an elastic search index, how to get the documents where array field has length greater than 0?

In elastic search index, how to get the documents where array field has length greater than 0?
I tried following multiple syntaxes but didn't get any breakthrough. I got same error in all of the syntaxes.
GET http://{{host}}:{{elasticSearchPort}}/student_details/_search
Syntax 1:
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['enrolledCourses'].values.length > 0",
"lang": "painless"
}
}
}
}
}
}
Error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [enrolledCourses] in mapping with types []"
}
Syntax 2:
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"source": "doc['enrolledCourses'].values.size() > 0",
"lang": "painless"
}
}
}
}
}
}
Error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [enrolledCourses] in mapping with types []"
}
Syntax 3:
{
"query": {
"bool": {
"filter" : {
"script" : {
"script" : "doc['enrolledCourses'].values.size() > 0"
}
}
}
}
}
Error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [enrolledCourses] in mapping with types []"
}
Syntax 4:
{
"query": {
"bool": {
"filter" : {
"script" : {
"script" : "doc['enrolledCourses'].values.length > 0"
}
}
}
}
}
Error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [enrolledCourses] in mapping with types []"
}
Please help me in solving this.
I don't know what version of elastic you run, then all my test I'd running on latest 7.9.0 version of Elasticsearch.
I will use painless script for scripting.
I put to documents to index test:
PUT test/_doc/1
{
"name": "Vasia",
"enrolledCourses" : ["test1", "test2"]
}
PUT test/_doc/2
{
"name": "Petya"
}
How you can see one document contains enrolledCourses field and second not.
In painless you don't need use values field and you can take length directly, this is according to painless documentation. Then I skip using values operator in my script:
GET test/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['enrolledCourses'].length > 0",
"lang": "painless"
}
}
}
]
}
}
}
After running I'd received 2 different errors:
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:757)",
"org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:116)",
"org.elasticsearch.index.query.QueryShardContext.lambda$lookup$0(QueryShardContext.java:331)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:97)",
"org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:94)",
"java.base/java.security.AccessController.doPrivileged(AccessController.java:312)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:94)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:41)",
"doc['enrolledCourses'].length > 0",
" ^---- HERE"
]
}
and
{
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [enrolledCourses] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
Both of errors is pretty clear. First for document where field doesn't exists and second because Elasticsearch indexed string array field as default mapping type text.
Both of cases is very easy to fix by mapping enrolledCourses field as keyword.
In first case mapping will always provide empty field and in second keyword word be allow to run fielddata property.
PUT test
{
"settings": {
"number_of_replicas": 0
},
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"enrolledCourses": {
"type": "keyword"
}
}
}
}
Now I will receive right answer for query:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"name" : "Vasia",
"enrolledCourses" : [
"test1",
"test2"
]
}
}
]
}
}

Boosting an Elasticsearch result by 'age' if applicable

I want to search multiple indices in Elasticsearch (news items in search_news and documents in search_documents) and whenever an index has a publicationDate field (news items only), I want to 'sort' this, so I boost newer news items. I am using Elasticsearch 6.8.
I found the script_scoring example in https://dzone.com/articles/23-useful-elasticsearch-example-queries (last one). But this throws errors and based on the documentation I came up to
GET /search_*/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must": {
"query_string": {
"query": "Lorem Ipsum"
}
},
"must_not": {
"exists": {
"field": "some_exlusion_field"
}
}
}
},
"script_score": {
"script": {
"params" : {
"threshold": "2019-04-04"
},
"source": "publishDate = doc['publishDate'].value; if (publishDate > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5) } return log(1);"
}
}
}
}
}
This results in the error:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"publishDate = doc['publis ...",
"^---- HERE"
],
"script": "publishDate = doc['publishDate'].value; if (publishDate > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5) } return log(1);",
"lang": "painless"
}
}
I managed to minify the source to:
"source": "if (doc['publishDate'] > '2019-04-04') { return 5 } return 1;"
But no success:
"failures" : [
{
"shard" : 0,
"index" : "search_document_page",
"node" : "c0iLpxiJRqmgwS0KY8OybA",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
"org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
"if (doc['publishDate'] > '2019-04-04') { ",
" ^---- HERE"
],
"script" : "if (doc['publishDate'] > '2019-04-04') { return 5 } return 1;",
"lang" : "painless",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "No field found for [publishDate] in mapping with types []"
}
}
},
{
"shard" : 0,
"index" : "search_news",
"node" : "c0iLpxiJRqmgwS0KY8OybA",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"if (doc['publishDate'] > '2019-04-04') { ",
" ^---- HERE"
],
"script" : "if (doc['publishDate'] > '2019-04-04') { return 5 } return 1;",
"lang" : "painless",
"caused_by" : {
"type" : "class_cast_exception",
"reason" : "Cannot apply [>] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Dates] and [java.lang.String]."
}
}
}
]
}
}
Any suggestion for checking the existence of the field in doc and how to check the date properly?
For the existence check ( doc here ) :
if (!doc.containsKey('publishDate')) {
return 1;
}
And for the date comparison, you can try this way
if (Date.parse('yyyy-MM-dd', params.threshold).getMillis() > doc['publishDate'].getMillis()) {
return 5;
} else {
return 1;
}

How to aggregate over 'ip' field using script

I am trying to perform terms aggregation over a field of type 'ip' using inline script like this :
{
"aggs": {
"by_ipaddress": {
"terms": {
"script": {
"inline": "doc['ipAddressFrom'].value",
"lang": "painless"
}
}
}
}
}
It throws following exception :
"reason": {
"type": "script_exception",
"reason": "runtime error",
"caused_by": {
"type": "array_index_out_of_bounds_exception",
"reason": "16"
},
"script_stack": [
"org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:602)",
"org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:152)",
"org.elasticsearch.index.fielddata.ScriptDocValues$Strings.getValue(ScriptDocValues.java:83)",
"doc['ipAddressFrom'].value",
" ^---- HERE"
],
"script": "doc['ipAddressFrom'].value",
"lang": "painless"
}
But when I aggregate over the same field :
{
"aggs": {
"by_ipaddress": {
"terms": {
"field": "ipAddressFrom"
}
}
}
}
It works.
Mapping for the field "ipAddressFrom" is :
"ipAddressFrom" : {
"type" : "ip"
}
Please let me know how to use ip fields in script.
For elasticsearch 6.x, there is nothing wrong with using ip type in painless scripts.
Your aggregation with inline doesn't work because for some documents field ipAddressFrom does not exist.
You can fix the aggregation with something like:
"script": {
"inline": "if (doc.containsKey('ipAddressFrom') && !doc['ipAddressFrom'].empty){ return doc['ipAddressFrom'].value} else {return '0'}",
"lang": "painless"
}

Resources