Replacing OR/AND/NOT filters with bool filter creates a hard-to-understand query with too many levels? - elasticsearch

I have the following filter in a filtered query. As seen, it has many OR/AND/NOT filters at different levels. I was advised to replace them with bool filters for performance reasons, and I am going to do that.
"filter" : {
"or" : [
{
"and" : [
{ "range" : { "start" : { "lte": 201407292300 } } },
{ "range" : { "end" : { "gte": 201407292300 } } },
{ "term" : { "condtion1" : false } },
{
"or" : [
{
"and" : [
{ "term" : { "condtion2" : false } },
{
"or": [
{
"and" : [
{ "missing" : { "field" : "condtion6" } },
{ "missing" : { "field" : "condtion7" } }
]
},
{ "term" : { "condtion6" : "nop" } }
{ "term" : { "condtion7" : "rst" } }
]
}
]
},
{
"and" : [
{ "term" : { "condtion2" : true } },
{
"or": [
{
"and" : [
{ "missing" : { "field" : "condtion3" } },
{ "missing" : { "field" : "condtion4" } },
{ "missing" : { "field" : "condtion5" } },
{ "missing" : { "field" : "condtion6" } },
{ "missing" : { "field" : "condtion7" } }
]
},
{ "term" : { "condtion3" : "abc" } },
{ "term" : { "condtion4" : "def" } },
{ "term" : { "condtion5" : "ghj" } },
{ "term" : { "condtion6" : "nop" } },
{ "term" : { "condtion7" : "rst" } }
]
}
]
}
]
}
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_1" }
},
{ "range" : { "start" : { "lte": 201407302300 } } },
{
"or": [
{ "term" : { "condtion9" : "GROUP_B" } },
{
"and" : [
{ "term" : { "condtion9" : "GROUP_A" } },
{ "ids" : { values: [100, 10] } }
]
}
]
}
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_2" }
},
{ "ids" : { values: [100, 10] } }
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_3" }
},
{
"or": [
{ "term" : { "condtion1" : true } },
{ "range" : { "end" : { "lt": 201407302300 } } }
]
},
{
"or": [
{ "term" : { "condtion9" : "GROUP_B" } },
{
"and" : [
{ "term" : { "condtion9" : "GROUP_A" } },
{ "ids" : { values: [100, 10] } }
]
}
]
}
]
}
]
}
However, I feel replacing these OR/AND/NOT filters would create a query that has too many levels and is hard to understand. For example, replacing
"or": [
....
]
I have to have:
"bool" {
"should": [
]
}
Am I right that replacing OR/AND/NOT with bool filter in my case is at the expense of sacrificing understandability?
A related question
If I have to replace OR/AND/NOT filters for performance, should I replace ALL of these OR/AND/NOT filters, or just some of them such as the one at the top for example?
Thanks and regards.

You should replace all of them except geo/script/range filters. Having said that understanding the possible impact of each filter can help you also. For example if one of the filter is going to filter out say 90% of the result then you may want to put that in an and filter at the starting. Since and/or filters are executed sequentially the rest of the filters will have lesser documents to process. In case of bool filters all the filters are combined in a single bitset operation. You might have already read about it.
I don't think you will be sacrificing understability by replacing OR/AND/NOT with bool filter. As the example you have given, for a single or filter converting to should filter looks like an increase in the query structure but in an overall combination the structure would be almost similar.

Related

ElasticSearch is not sorting file names in correct order

This is a contrived example to illustrate my problem. I have a bunch of filename that I would like to sort alphabetically in the same way macOS might do in a finder window.
These are my indexed file names in the order I would expect to see them sorted:
A Tribe Called Quest - Can I Kick It (1).mp3
a.png
Bcc 05.png
Birling Gap Cliffs.jpg
Durdle Door.jpg
f.png
Frost.jpg
p.png
Users order.mp4
z.png
And this is what I'm doing in Kibana dev tools to test:
## sorting contrived example
# create the index with keyword filename for sorting
PUT /file-names
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc" : {
"properties": {
"filename": { "type": "keyword" }
}
}
}
}
# create bunch of documents
POST file-names/_doc/_bulk
{ "index":{} }
{ "filename":"A Tribe Called Quest - Can I Kick It (1).mp3" }
{ "index":{} }
{ "filename":"a.png" }
{ "index":{} }
{ "filename":"Bcc 05.png" }
{ "index":{} }
{ "filename":"Birling Gap Cliffs.jpg" }
{ "index":{} }
{ "filename":"Durdle Door.jpg" }
{ "index":{} }
{ "filename":"Frost.jpg" }
{ "index":{} }
{ "filename":"f.png" }
{ "index":{} }
{ "filename":"Users order.mp4" }
{ "index":{} }
{ "filename":"p.png" }
{ "index":{} }
{ "filename":"z.png" }
# query with sort - bugged
GET /file-names/_search
{
"sort": {
"filename": {
"order": "asc"
}
}
}
The results I'm getting back are:
"hits" : [
{
"_index" : "file-names",..."_score" : null,
"_source" : {
"filename" : "A Tribe Called Quest - Can I Kick It (1).mp3"
},
"sort" : [
"A Tribe Called Quest - Can I Kick It (1).mp3"
]
},
{
...
"_source" : {
"filename" : "Bcc 05.png"
},
"sort" : [
"Bcc 05.png"
]
},
{
...
"_source" : {
"filename" : "Birling Gap Cliffs.jpg"
},
"sort" : [
"Birling Gap Cliffs.jpg"
]
},
{
...
"_source" : {
"filename" : "Durdle Door.jpg"
},
"sort" : [
"Durdle Door.jpg"
]
},
{
...
"_source" : {
"filename" : "Frost.jpg"
},
"sort" : [
"Frost.jpg"
]
},
{
...
"_source" : {
"filename" : "Users order.mp4"
},
"sort" : [
"Users order.mp4"
]
},
{
...
"_source" : {
"filename" : "a.png"
},
"sort" : [
"a.png"
]
},
{
...
"_source" : {
"filename" : "f.png"
},
"sort" : [
"f.png"
]
},
{
...
"_source" : {
"filename" : "p.png"
},
"sort" : [
"p.png"
]
},
{
...
"_source" : {
"filename" : "z.png"
},
"sort" : [
"z.png"
]
}
]
Which are not in the order I'd expect. You can see "a.png" is below "Users order.mp4" for reasons I cannot understand.
Any help appreciated to get sorting working in the order I'd expect!
As #Alper suggested, this has already been addressed.
If you for some reason need to stick with the keyword mapping, here's how you can script-sort:
GET /file-names/_search
{
"sort": {
"_script": {
"type": "string",
"script": {
"lang": "painless",
"source": "doc['filename'].value.toLowerCase()"
},
"order": "desc"
}
}
}

Cannot divide multiple context values in Elasticsearch watcher

I have created a watch in Elasticsearch to alert me if the ratio of http errors is greater than 15% of total requests over 60 minutes.
I am using chain inputs to generate the dividend and divisor values for my ratio calculation.
In my condition I am using scripting to do the division and check if it is greater than my ratio.
However, whenever I use 2 ctx parameters to do the division, it always equals to zero.
If I play with it and only use one of ctx param, then it works fine.
It seems that we cannot use 2 ctx params in a condition.
Does anyone know how to get around this?
Below is my watch.
Thanks.
{
"trigger" : {
"schedule" : {
"interval" : "5m"
}
},
"input" : {
"chain":{
"inputs": [
{
"first": {
"search" : {
"request" : {
"indices" : [ "logstash-*" ],
"body" : {
"query" : {
"bool":{
"must": [
{
"match" : {"d.uri": "xxxxxxxx"}
},
{
"match" : {"topic": "xxxxxxxx"}
}
],
"filter": {
"range": {
"#timestamp": {
"gte": "now-60m"
}
}
}
}
}
}
},
"extract": ["hits.total"]
}
}
},
{
"second": {
"search" : {
"request" : {
"body" : {
"query" : {
"bool":{
"must": [
{
"match" : {"d.uri": "xxxxxxxx"}
},
{
"match" : {"topic": "xxxxxxxx"}
},
{
"match" : {"d.status": "401"}
}
],
"filter": {
"range": {
"#timestamp": {
"gte": "now-60m"
}
}
}
}
}
}
},
"extract": ["hits.total"]
}
}
}
]
}
},
"condition" : {
"script" : {
"source" : "return (ctx.payload.second.hits.total / ctx.payload.first.hits.total) == 0"
}
}
}
The issue comes in fact from the fact that I was doing an integer division to get to a ratio in the form of 0.xx.
I reversed the operation and it is working fine.

Querying a nested array in Elasticsearch

I have the following data index in Elasticsearch with the following syntax:
PUT /try1
{
"mappings" : {
"product" : {
"properties" : {
"name": { "type" : "text" },
"categories": {
"type": "nested",
"properties": {
"range":{"type":"text"}
}
}
}
}
}
}
The range type has an array of words:["high","medium","low"]
I need to access the range element inside the nested category. I tried using the following syntax:
GET /try1/product/_search
{
"query": {
"nested" : {
"path" : "categories",
"query" : {
"bool" : {
"must" : [
{ "match" : {"categories.range": "low"} }
]
}
}
}
}
}
However, I am getting an error with the message:
"reason": """failed to create query:...
Can someone please offer a solution to this?
#KGB can you try to make your query slightly differently like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"categories.range": "low"
}
}
]
}
}
}
{
"query": {
"nested" : {
"path" : "categories",
"query" : {
"bool" : {
"must" : [
{ categories.range": "low"}
]
}
}
}
}
}
This worked perfectly

OR & AND Operators in Elastic search

Hi I want to achieve this in Elasticsearch.
select * from products where brandName = Accu or brandName = Perfor AND cat=lube(any where in any filed of an elastic search ).
I am using this query in Elasticsearch.
{
"bool": {
"must": {
"query_string": {
"query": "oil"
}
},
"should": [
{
"term": {
"BrandName": "Accu"
}
},
{
"term": {
"BrandName": "Perfor"
}
}
]
}
}
By this query m not getting the combination exact results.
You need to add minimum_should_match: 1 to your query and probably use match instead of term if your BrandName field is an analyzed string.
{
"bool" : {
"must" : {
"query_string" : {
"query" : "oil OR lube OR lubricate"
}
},
"minimum_should_match": 1, <---- add this
"should" : [ {
"match" : {
"BrandName" : "Accu"
}
}, {
"match" : {
"BrandName" : "Perfor"
}
} ]
}
}
This query satisfy your condition.
{
"bool" : {
"must" : { "term" : { "cat" : "lube" } },
"should" : [
{ "term" : { "BrandName" : "Accu" } },
{ "term" : { "BrandName" : "Perfor" } }
],
"minimum_should_match" : 1
}
}

How to order by many values in Elasticsearch terms aggregations

How do you order ES terms aggregations by multiple values?
At the moment i do:
aggs : {
aggName : {
terms : {
field : "foo",
order : { "subAgg.avg" : "desc" }
}
},
aggs : {
subAgg : {
stats : {
field : "bar"
}
}
}
}
The API says you can do:
order : [ { "subAgg.avg" : "desc" }, { "subAgg.count" : "desc" } ]
But this does not work, ES throws an error:
Unknown key for a START_ARRAY in [aggName]: [order].
I found something like this in other posts:
order : { "subAgg.avg" : "desc", "subAgg.count" : "desc" }
No error, but not sorted correctly.
My question is, how to correctly sort by many values?
I have ES 1.4.4 installed.
thx
EDITED:
Mapping
{
"mappings" : {
"mymapping" : {
"properties" : {
"foo" : {
"type" : "short"
}
}
}
}
}
Query:
{
query : {
match_all : {}
},
aggs : {
aggName : {
terms : {
field : "foo",
order : [ { "subAgg.avg" : "desc" }, { "subAgg.count" : "desc" } ]
},
aggs : {
subAgg : {
stats : {
field : "foo"
}
}
}
}
}
}
You can try this:
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
From: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "artist.country",
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
},
"aggs" : {
"rock" : {
"filter" : { "term" : { "genre" : "rock" }},
"aggs" : {
"playback_stats" : { "stats" : { "field" : "play_count" }}
}
}
}
}
}

Resources