Using Elasticsearch, how do I apply function scores to documents which conditionally have a property - elasticsearch

I have a handful of indexes, some of which have a particular date property indicating when it was published (date_publish), and others do not. I am trying to apply a gauss function to decay the score of documents which were published a long time ago. The relevant indexes are correctly configured to recognise the date_publish property as a date.
I have set up my query as follows, specifically filtering documents which do not have the property:
{
"index": "index_contains_prop,index_does_not_contains_prop",
"body": {
"query": {
"function_score": {
"score_mode": "avg",
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": "0"
}
}
},
{
"filter": {
"exists": {
"field": "date_publish"
}
},
"gauss": {
"date_publish": {
"origin": "now",
"scale": "728d",
"offset": "7d",
"decay": 0.5
}
}
}
]
}
},
"from": 0,
"size": 1000
}
}
However, the query errors with the following:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index_does_not_contains_prop",
"node": "1hfXZK4TT3-K288nIr0UWA",
"reason": {
"type": "parsing_exception",
"reason": "unknown field [date_publish]",
"line": 1,
"col": 0
}
}
]
},
"status": 400
}
I have RTFM'd many times, and i can't see any discrepancy - I ahve also tried wrapping the exists condition in a bool:must object, to no avail.
Have I misunderstood the purpose of the filter argument?

The exists query will only work on fields that are part of the index mapping. It will return only documents that have a value for this field, but the field itself still needs to be defined in the mapping. This is why you're getting an error - index_does_not_contains_prop does not have date_publish mapped. You can use the put mapping API to add this field to the indexes who don't have it (it won't change any document), and then your query should work.

Related

ElasticSearch painless filter script on text fields not working

I want to use an equality filter (exact match) using a painless script in ElasticSearch. I cannot use directly a term query because the check I want to do is on a text field (and not keyword), so I tried with a match_phrase. This is my mapping: I can't change it.
{
"my_index": {
"aliases": {},
"mappings": {
"properties": {
"my_field": {
"type": "text"
},
}
},
"settings": {
"index": {
"max_ngram_diff": "60",
"number_of_shards": "8",
"blocks": {
"read_only_allow_delete": "false",
"write": "false"
},
"analysis": {...}
}
}
}
}
I tried this query, following this guide:
{
"size": 10,
"index": "my_index",
"body": {
"query": {
"bool": {
"should": [{
"match_phrase": {
"my_field": {
"query": "MY_VALUE",
"boost": 1.5,
"slop": 0
}
}
}],
"must": [],
"filter": [{
"script": {
"script": {
"lang": "painless",
"source": "doc['my_field'] == 'MY_VALUE'"
}
}
}],
"minimum_should_match": 1
}
}
}
}
Anyway, I got this error:
body:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "my_index",
"node": "R99vOHeORlKsk9dnCzcMeA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:101)",
"org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:53)",
"doc['my_field'] === 'MY_VALUE'",
" ^---- HERE"
],
"script": "doc['my_field'] === 'MY_VALUE'",
"lang": "painless",
"position": {
"offset": 4,
"start": 0,
"end": 30
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "No field found for [my_field] in mapping with types []"
}
}
}
]
},
"status": 400
}
It seems that doc doesn't contain text fields (I tried with other non-text fields and it works!)
Here they say that:
Doc values are a columnar field value store, enabled by default on all
fields except for analyzed text fields.
And here they say that:
text fields are searchable by default, but by default are not
available for aggregations, sorting, or scripting. Set fielddata=true
on your_field_name in order to load fielddata in memory by uninverting
the inverted index.
But I can't change the mapping.
How I can access text fields in a painless filter script?
(This is similar to ElasticSearch exact match on text field with script but more specific on the filtering script)
ScriptQuery only supports doc_values.
Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations. Doc values are supported on almost all field types, with the notable exception of text and annotated_text fields.
As per discussion here
https://github.com/elastic/elasticsearch/issues/30984
Accessing the _source field is slow and something that we don't want to expose in the ScriptQuery because it would be need to be accessed on every document making the search very inefficient.
So you will either need to add keyword sub-field in mapping and reindex data or enable fields data - which will consume large memory

ES giving error when sorting by distance

I'm trying to sort search results by distance. However, when i try i get the following error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "sort option [location] not supported"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "roeselaredev",
"node": "2UYlfd7sTd6qlJWgdK2wzQ",
"reason": {
"type": "illegal_argument_exception",
"reason": "sort option [location] not supported"
}
}
]
},
"status": 400
}
The query i sent looks like this:
GET _search
{
"query": {
"match_all": []
},
"sort": [
{
"geo_distance": {
"location": {
"lat": 50.9436034,
"long": 3.1242917
},
"order":"asc",
"unit":"km",
"distance_type":"plane"
}
},
{
"_score": {
"order":"desc"
}
}
]
}
As near as i can tell i followed the instructions in the documentation to the letter. I'm not getting a malformed query result. I'm just getting a not supported result for the sort by distance option. Any ideas as to what i'm doing wrong?
The query dsl is invalid the OP is almost-correct :) but missing an under-score.
While sorting by distance it is _geo_distance and not geo_distance.
Example:
GET _search
{
"query": {
"match_all": []
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 50.9436034,
"long": 3.1242917
},
"order":"asc",
"unit":"km",
"distance_type":"plane"
}
},
{
"_score": {
"order":"desc"
}
}
]
}

ElasticSearch aggregate information grouped by month

I am working on an ElasticSearch query that should give me back all documents where a date field is now-1 year ago and then group them all by month (giving me total count for each month), but am failing on writing this query.
This is what I have:
{
"query": {
"bool": {
"must": [
{
"terms": {
"account_id": [
1
]
}
}
]
}
},
"aggs": {
"growth": {
"date_range": {
"field": "member_since",
"format": "YYYY-MM-DD",
"ranges": [
{
"to": "now-1Y/M"
},
{
"from": "now-1Y/M"
}
]
}
}
},
"size": 100
}
I am running the query like so:
POST https://my-es-cluster-url.com, but I keep getting this error:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "unit [Y] not supported for date math [-1Y/M]"
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query_fetch",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "index0",
"node": "MkypXlGdQamAplca1JIgZQ",
"reason": {
"type": "parse_exception",
"reason": "unit [Y] not supported for date math [-1Y/M]"
}
}
]
},
"status": 400
}
I reproduced your problem with a simpler query on my data.
"query": {
"range": {
"recent_buy_transaction": {
"from": "now-1Y"
}
}
}
I get the same error.
ElasticSearchParseException[unit [Y] not supported for date math [-1Y]]
However, using small y fixes the problem. Hence, you should try using:
now-1y/M
From the documentation
The supported time units are: y (year), M (month), w (week), d (day), h (hour), m (minute), and s (second).

Sort parent type based on one field within an array of nested Object in elasticsearch

I have below mapping in my index:
{
"testIndex": {
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}
}
}
"time_views" actually is an array, but inner attributes not array.
I want to sort my type1 records based on maximum value of "views" attribute of each type1 record. I read elasticsearch sort documentation, it's have solution for use cases that sorting is based on field (single or array) of single nested object. but what I want is different. I want pick maximum value of "views" for each document and sort the documents based on these values
I made this json query
{
"size": 10,
"query": {
"range": {
"timeStamp": {
"gte": 1468852617347,
"lte": 1468939017347
}
}
},
"from": 0,
"sort": [
{
"time_views.views": {
"mode": "max",
"nested_path": "time_views",
"order": "desc"
}
}
]
}
but I got this error
{
"error": {
"phase": "query",
"failed_shards": [
{
"node": "n4rxRCOuSBaGT5xZoa0bHQ",
"reason": {
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
},
"index": "data",
"shard": 0
}
],
"reason": "all shards failed",
"grouped": true,
"type": "search_phase_execution_exception",
"root_cause": [
{
"reason": "[nested] nested object under path [time_views] is not of nested type",
"col": 136,
"line": 1,
"index": "data",
"type": "query_parsing_exception"
}
]
},
"status": 400
}
as I mentioned above time_views is an array and I guess this error is because of that.
even I can't use sorting based on array field feature, because "time_views" is not a primitive type.
I think my last chance is write a custom sorting by scripting, but I don't know how.
please tell me my mistake if it's possible to achieve to what I'm want, otherwise give me a simple script sample.
tnx :)
The error message does a lot to explain what is wrong with the query. Actually, the problem is with the mapping. And I think you intended on using nested fields, since you are using nested queries.
You just need to make your time_views field as nested:
"mappings": {
"type1": {
"properties": {
"text": {
"type": "string"
},
"time_views": {
"type": "nested",
"properties": {
"timestamp": {
"type": "long"
},
"views": {
"type": "integer"
}
}
}
}
}
}

Scripting Elasticsearch 2.1| No such property: doc

I am facing an issue while trying to execute a script within an ES JSON request
The request:
POST _search
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
]
}
},
"aggs": {
"bucket_histogram": {
"histogram": {
"field": "dayTime",
"interval": 10
},
"aggs": {
"get_average": {
"avg": {
"field": "value"
}
},
"check-threshold": {
"bucket_script": {
"buckets_path": {
"averageValue": "get_average"
},
"script": "averageValue - doc[\"thresholdValue\"].value"
}
}
}
}
}
}
But I get this error instead of returning values
{
"error": {
"root_cause": [],
"type": "reduce_search_phase_exception",
"reason": "[reduce] ",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "groovy_script_execution_exception",
"reason": "failed to run inline script [averageValue - doc[\"thresholdValue\"].value] using lang [groovy]",
"caused_by": {
"type": "missing_property_exception",
"reason": "No such property: doc for class: 7dcca7d142ac809a7192625d43d95bde9883c434"
}
}
},
"status": 503
}
Yet if I remove doc[\"thresholdValue\"] and enter a number everything works fine.
You are using a bucket_script, which is a part of the pipeline aggregations released with Elasticsearch 2.0. Pipeline aggregations work against other aggregations and not documents, which is why the doc context is not supplied to the aggregation.
If you want to process aggregations against specific documents, then perhaps you want the scripted metric aggregation instead.

Resources