AWS Elastic Search(v7.10.2) cannot use `collapse` in conjunction with `search_after` - elasticsearch

I'm using AWS Elastic Search verion 7.10.2 and getting error ```cannot use collapse in conjunction with `search_after```` on following query. Please let me know what could be the possible solution to fix it. I cannot upgrade the ES version as from the AWS console it looks like it is the max supported. Please let me know if there can be any alternative query for the same.
Search_after I wanted to use for the pagination support and collpase I wanted to use to pick the latest version of the document
StreamingSegmentId : it is the unique Id for each document
StreamingSegmentStartTime : My query can return many unique documents. I wanted to sort the entire result set in ascending order of StreamingSegmentStartTime
FanoutPublishTimestamp : This is the doc published timestamp. since I want the latest version of each document. I'm sorting all the docs belongs to same StreamingSegmentId in descending order and use collpase of pick the top one.
GET /sessions/_search
{
"size": 2,
"query": {
"bool": {
"must": [
{
"terms": {
"benefitId": [
"PRIME"
],
"boost": 1
}
},
{
"range": {
"streamingSegmentStartTime": {
"from": 1647821557000,
"to": 1647825157000,
"include_lower": true,
"include_upper": false,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [
"deviceTypeId",
"timeline"
],
"excludes": []
},
"sort": [
{
"streamingSegmentStartTime": {
"order": "asc"
},
"fanoutPublishTimestamp" : {
"order": "desc"
}
}
],
"search_after": [
6749348300022,
6749348300048
],
"collapse": {
"field": "streamingSegmentId"
}
}

Related

ElasticSearch : how to sort ES document based on document version

Following is my ES query I want to sort my document in descending order based on "_version" but not sure how to do it
{
"query":
{
"bool":
{
"must":
[
{
"terms":
{
"streamingSegmentId":
[
"00003319-b7fa-3409-806a-fa3bb5d2be26"
],
"boost": 1
}
},
{
"range":
{
"streamingSegmentStartTime":
{
"from": 1644480000000,
"to": 1647476658447,
"include_lower": true,
"include_upper": false,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"version": true,
"_source":
{
"includes":
[
"errorCount",
"benefitId",
"streamingSegmentStopTime",
"fanoutPublishTimestamp",
"sessionUpdateTime",
"contentSegmentUpdateTime"
],
"excludes":
[]
},
"sort":
[
{
"streamingSegmentStartTime":
{
"order": "asc"
}
},
{
"_version":
{
"order": "desc"
}
}
]
}
I was not able to sort based on _version, hence I added one timestamp field called fanoutPublishTimestamp to sort my document in descending order of time. Following is my udpated query and I'm using collapse to fetch only latest timestamp document. Now the recent problem I started facing with following query is collpase cannot be used with search_after. search_after I'm using to add pagination support in my ES query.
I'm using AWS Elastic search which is using 7.10 version of ES and 8.1 ES version only supports collapse with Search_after. Please let me know if anybody has better solution to deal with this issue
GET /sessions/_search
{
"size": 2,
"query": {
"bool": {
"must": [
{
"terms": {
"benefitId": [
"PRIME"
],
"boost": 1
}
},
{
"range": {
"streamingSegmentStartTime": {
"from": 1647821557000,
"to": 1647825157000,
"include_lower": true,
"include_upper": false,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [
"deviceTypeId",
"timeline"
],
"excludes": []
},
"sort": [
{
"streamingSegmentStartTime": {
"order": "asc"
}
},
{
"fanoutPublishTimestamp": {
"order": "desc"
}
}
],
"search_after": [
"1647821557001",
"1647829603837"
],
"collapse": {
"field": "streamingSegmentId"
}
}
As you didn't provide your sample documents, and didn't explain what it means that its not working, I am assuming that it's because of two sort param you are using, if you use just one _version it works(tested locally on my sample documents).
Mostly your another sort criteria streamingSegmentStartTime is causing few documents which has higher _version to come later in the response, try to remove it and see if it provided you expected result.

Elastic - Filter after selecting top 5 hits

I'm using the alerting feature in Kibana and I want to check if the last 5 consecutive values of a field exceed a threshold x but if I use a filter in my elastic query, it gets applied before the top N aggregation.
Is there a way in which I can apply the filter after or check if the last consecutive values exceed a threshold using some other selector or method? I don't want to check this in the trigger condition in painless because that will return all the documents in the ctx and not just the ones which exceeded the threshold which I want to display in my alert message.
I've been stuck with this for a while and I have only seen blog posts saying sub aggregation is not possible on top N so any help or work around would be much appreciated.
This is my query :
{
"size": 500,
"query": {
"bool": {
"filter": [
{
"match_all": {
"boost": 1
}
},
{
"match_phrase": {
"client.id": {
"query": "42",
"slop": 0,
"zero_terms_query": "NONE",
"boost": 1
}
}
},
{
"range": {
"#timestamp": {
"from": "{{period_end}}||-10m",
"to": "{{period_end}}",
"include_lower": true,
"include_upper": true,
"format": "epoch_millis",
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"aggs": {
"2": {
"terms": {
"field": "component.name",
"order": {
"_key": "desc"
},
"size": 50
},
"aggs": {
"3": {
"terms": {
"field": "client.name.keyword",
"order": {
"_key": "desc"
},
"size": 5
},
"aggs": {
"1": {
"top_hits": {
"docvalue_fields": [
{
"field": "gc.oldgen.used",
"format": "use_field_mapping"
}
],
"_source": "gc.oldgen.used",
"size": 5,
"sort": [
{
"#timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
Did you try to use a sub filter aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
Or you can use a pipeline aggregation to manipulate your aggregations results
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html
by the way, a term query on the client id looks more appropriate.

Elasticsearch wildcard query on numeric fields without using mapping

I'm looking for a creative solution because I can't use mapping as solution is already in production.
I have this query:
{
"size": 4,
"query": {
"bool": {
"filter": [
{
"range": {
"time": {
"from": 1597249812405,
"to": null,
}
}
},
{
"query_string": {
"query": "*181*",
"fields": [
"deId^1.0",
"deTag^1.0",
],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"sort": [
{
"time": {
"order": "asc"
}
}
]
}
"deId" field is an integer in elasticsearch and the query returns nothing (though should),
Is there a solution to search for wildcards in numeric fields without using the multi field option which requires mapping?
Once you index an integer, ES does not treat the individual digits as position-sensitive tokens. In other words, it's not directly possible to perform wildcards on numeric datatypes.
There are some sub-optimal ways of solving this (think scripting & String.substring) but the easiest would be to convert those integers to strings.
Let's look at an example deId of 123181994:
POST prod/_doc
{
"deId_str": "123181994"
}
then
GET prod/_search
{
"query": {
"bool": {
"filter": [
{
"query_string": {
"query": "*181*",
"fields": [
"deId_str"
]
}
}
]
}
}
}
works like a charm.
Since your index/mapping is already in production, look into _update_by_query and stringify all the necessary numbers in a single call. After that, if you don't want to (and/or cannot) pass the strings at index time, use ingest pipelines to do the conversion for you.

Elastic search query is not executed

Hi I am using elastic search engine to search for some items, items are placed in some buildings, when running this query, Items returned are not sorted even if I change the sort direction. My first impression is that the block sort is not even executed. Is there something wrong with the query ?
{
"from": 0,
"size": 20,
"query": {
"bool": {
"filter": [
{
"terms": {
"buildingsUuid": [
"9caff147-d019-416a-a167-f02bab7334fd"
],
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"sort": [
{
"itemId": {
"order": "desc"
}
}
]
}

query not applying custom score

I'm making the next query, my problem is that the custom score (scrip_score) is not being applied. Am I doing something wrong?:
{
"query": {
"bool": {
"must": [
{
"terms": {
"tactics": [
"user_id"
"type_user",
"browser_plugins",
"cashback"
]
}
}
]
},
"script_score": {
"script": "type_user === 2 ? 1 : 2"
}
},
"from": "0",
"size": 50,
"sort": {
"name": {
"order": "desc",
"ignore_unmapped": true
}
}
}
The script_score section in your query gets ignored. If you want it to be taken into account you need to wrap you existing bool query into a function_score query where you can use the script_score part as well.

Resources