ElasticSearch painless filter script on text fields not working - elasticsearch

I want to use an equality filter (exact match) using a painless script in ElasticSearch. I cannot use directly a term query because the check I want to do is on a text field (and not keyword), so I tried with a match_phrase. This is my mapping: I can't change it.
{
"my_index": {
"aliases": {},
"mappings": {
"properties": {
"my_field": {
"type": "text"
},
}
},
"settings": {
"index": {
"max_ngram_diff": "60",
"number_of_shards": "8",
"blocks": {
"read_only_allow_delete": "false",
"write": "false"
},
"analysis": {...}
}
}
}
}
I tried this query, following this guide:
{
"size": 10,
"index": "my_index",
"body": {
"query": {
"bool": {
"should": [{
"match_phrase": {
"my_field": {
"query": "MY_VALUE",
"boost": 1.5,
"slop": 0
}
}
}],
"must": [],
"filter": [{
"script": {
"script": {
"lang": "painless",
"source": "doc['my_field'] == 'MY_VALUE'"
}
}
}],
"minimum_should_match": 1
}
}
}
}
Anyway, I got this error:
body:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index",
"node": "R99vOHeORlKsk9dnCzcMeA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [my_field] in mapping with types []"
}
}
}
]
},
"status": 400
}
It seems that doc doesn't contain text fields (I tried with other non-text fields and it works!)
Here they say that:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
And here they say that:
text fields are searchable by default, but by default are not
available for aggregations, sorting, or scripting. Set fielddata=true
on your_field_name in order to load fielddata in memory by uninverting
the inverted index.
But I can't change the mapping.
How I can access text fields in a painless filter script?
(This is similar to ElasticSearch exact match on text field with script but more specific on the filtering script)

ScriptQuery only supports doc_values.
Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations. Doc values are supported on almost all field types, with the notable exception of text and annotated_text fields.
As per discussion here
https://github.com/elastic/elasticsearch/issues/30984
Accessing the _source field is slow and something that we don't want to expose in the ScriptQuery because it would be need to be accessed on every document making the search very inefficient.
So you will either need to add keyword sub-field in mapping and reindex data or enable fields data - which will consume large memory

Related

How to query documents where a rank_features field is missing?

I have an index with a few hundred thousand documents. Some of them have a rank_features field called my_field. I want to retrieve documents without that field.
I tried:
"query": {
"bool": {
"must_not": [
{"exists": {"field":"my_field"}}]
...
But I get the following error:
"error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "failed to create query: [rank_features] fields do not support [exists] queries",
...
The index mapping is defined as follows:
"mappings": {
"dynamic": "strict",
"_routing": {
"required": true
},
"properties": {
"my_field": {
"properties": {
"my_subfield": {
"type": "rank_features"
}
}
...
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"mapping": {
"total_fields": {
"limit": "2000"
}
},
"refresh_interval": "1s",
"number_of_shards": "10",
"blocks": {
"write": "false"
},
Note that despite the mapping being strict, this field was added recently and older documents don't have it.
Tldr;
You are doing a exist query against a field that only support rank_feature queries
As per the documentation of the rank_features field.
rank_features fields do not support sorting or aggregating and may only be queried using rank_feature queries.

Open Search, exclude field from indexing in mapping

I have the following mapping:
{
"properties": {
"type": {
"type": "keyword"
},
"body": {
"type": "text"
},
"id": {
"type": "keyword"
},
"date": {
"type": "date"
},
},
}
body field is going to be an email message, it's very long and I don't want to index it.
what is the proper way to exclude this field from indexing?
What I tried:
enabled: false - as I understand from the documentation, it's applied only to object type fields but in my case it's not really an object so I'm not sure whether I can use it.
index: false/'no' - this breaks the code at all and does not allow me to make a search. My query contains query itself and aggregations with filter. Filter contains range:
date: { gte: someDay.getTime(), lte: 'now' }
P.S. someDay is a certain day in my case.
The error I get after applying index: false in mapping to the body field is the following:
{
"error":
{
"root_cause":
[
{
"type": "number_format_exception",
"reason": "For input string: \"now\""
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards":
[
{
"shard": 0,
"index": "test",
"node": "eehPq21jQsmkotVOqQEMeA",
"reason":
{
"type": "number_format_exception",
"reason": "For input string: \"now\""
}
}
],
"caused_by":
{
"type": "number_format_exception",
"reason": "For input string: \"now\"",
"caused_by":
{
"type": "number_format_exception",
"reason": "For input string: \"now\""
}
}
},
"status": 400
}
I'm not sure how these cases are associated as the error is about date field while I'm adding index property to body field.
I'm using: "#opensearch-project/opensearch": "^1.0.2"
Please help me to understand:
how to exclude field from indexing.
why applying index: false to body field in mapping breaks the code an I get an error associated with date field.
You should just modify your mapping to this:
"body": {
"type": "text",
"index": false
}
And it should work

Getting illegal_argument_exception", "reason": "Fielddata is disabled on text fields by default elastic search

I am getting this query when i try to run below query from Postman
{ "error": { "root_cause": [ { "type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set
fielddata=true on [ID] in order to load fielddata in memory by
uninverting the inverted index. Note that this can however use
significant memory. Alternatively use a keyword field instead." }
Here is the request
{
"size": 11,
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"search.doc.TypeId": {
"value": 1,
"boost": 1.0
}
}
}
],
"adjust_negative": true,
"boost": 1.0
}
}
],
"adjust_negative": true,
"boost": 1.0
}
},
"sort": [
{
"ID": {
"order": "desc"
}
}
]
}
Based on the error it seems that the objectID field is of text type. By default, field data is disabled on text fields.
So, according to the error, first, you need to modify your index mapping, so that the text field have field data enabled. Modify your index mapping, as shown below
PUT <index-name>/_mapping
{
"properties": {
"objectID": {
"type": "text",
"fielddata": true
}
}
}
Now use the same search query as given in the question, to get the desired results.

Using Elasticsearch, how do I apply function scores to documents which conditionally have a property

I have a handful of indexes, some of which have a particular date property indicating when it was published (date_publish), and others do not. I am trying to apply a gauss function to decay the score of documents which were published a long time ago. The relevant indexes are correctly configured to recognise the date_publish property as a date.
I have set up my query as follows, specifically filtering documents which do not have the property:
{
"index": "index_contains_prop,index_does_not_contains_prop",
"body": {
"query": {
"function_score": {
"score_mode": "avg",
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "0"
}
}
},
{
"filter": {
"exists": {
"field": "date_publish"
}
},
"gauss": {
"date_publish": {
"origin": "now",
"scale": "728d",
"offset": "7d",
"decay": 0.5
}
}
}
]
}
},
"from": 0,
"size": 1000
}
}
However, the query errors with the following:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index_does_not_contains_prop",
"node": "1hfXZK4TT3-K288nIr0UWA",
"reason": {
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
}
]
},
"status": 400
}
I have RTFM'd many times, and i can't see any discrepancy - I ahve also tried wrapping the exists condition in a bool:must object, to no avail.
Have I misunderstood the purpose of the filter argument?
The exists query will only work on fields that are part of the index mapping. It will return only documents that have a value for this field, but the field itself still needs to be defined in the mapping. This is why you're getting an error - index_does_not_contains_prop does not have date_publish mapped. You can use the put mapping API to add this field to the indexes who don't have it (it won't change any document), and then your query should work.

Sort parent type based on one field within an array of nested Object in elasticsearch

I have below mapping in my index:
{
"testIndex": {
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}
}
}
"time_views" actually is an array, but inner attributes not array.
I want to sort my type1 records based on maximum value of "views" attribute of each type1 record. I read elasticsearch sort documentation, it's have solution for use cases that sorting is based on field (single or array) of single nested object. but what I want is different. I want pick maximum value of "views" for each document and sort the documents based on these values
I made this json query
{
"size": 10,
"query": {
"range": {
"timeStamp": {
"gte": 1468852617347,
"lte": 1468939017347
}
}
},
"from": 0,
"sort": [
{
"time_views.views": {
"mode": "max",
"nested_path": "time_views",
"order": "desc"
}
}
]
}
but I got this error
{
"error": {
"phase": "query",
"failed_shards": [
{
"node": "n4rxRCOuSBaGT5xZoa0bHQ",
"reason": {
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
},
"index": "data",
"shard": 0
}
],
"reason": "all shards failed",
"grouped": true,
"type": "search_phase_execution_exception",
"root_cause": [
{
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
}
]
},
"status": 400
}
as I mentioned above time_views is an array and I guess this error is because of that.
even I can't use sorting based on array field feature, because "time_views" is not a primitive type.
I think my last chance is write a custom sorting by scripting, but I don't know how.
please tell me my mistake if it's possible to achieve to what I'm want, otherwise give me a simple script sample.
tnx :)
The error message does a lot to explain what is wrong with the query. Actually, the problem is with the mapping. And I think you intended on using nested fields, since you are using nested queries.
You just need to make your time_views field as nested:
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"type": "nested",
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}

Resources