I have the following filter in a filtered query. As seen, it has many OR/AND/NOT filters at different levels. I was advised to replace them with bool filters for performance reasons, and I am going to do that.
"filter" : {
"or" : [
{
"and" : [
{ "range" : { "start" : { "lte": 201407292300 } } },
{ "range" : { "end" : { "gte": 201407292300 } } },
{ "term" : { "condtion1" : false } },
{
"or" : [
{
"and" : [
{ "term" : { "condtion2" : false } },
{
"or": [
{
"and" : [
{ "missing" : { "field" : "condtion6" } },
{ "missing" : { "field" : "condtion7" } }
]
},
{ "term" : { "condtion6" : "nop" } }
{ "term" : { "condtion7" : "rst" } }
]
}
]
},
{
"and" : [
{ "term" : { "condtion2" : true } },
{
"or": [
{
"and" : [
{ "missing" : { "field" : "condtion3" } },
{ "missing" : { "field" : "condtion4" } },
{ "missing" : { "field" : "condtion5" } },
{ "missing" : { "field" : "condtion6" } },
{ "missing" : { "field" : "condtion7" } }
]
},
{ "term" : { "condtion3" : "abc" } },
{ "term" : { "condtion4" : "def" } },
{ "term" : { "condtion5" : "ghj" } },
{ "term" : { "condtion6" : "nop" } },
{ "term" : { "condtion7" : "rst" } }
]
}
]
}
]
}
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_1" }
},
{ "range" : { "start" : { "lte": 201407302300 } } },
{
"or": [
{ "term" : { "condtion9" : "GROUP_B" } },
{
"and" : [
{ "term" : { "condtion9" : "GROUP_A" } },
{ "ids" : { values: [100, 10] } }
]
}
]
}
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_2" }
},
{ "ids" : { values: [100, 10] } }
]
},
{
"and" : [
{
"term": { "condtion8" : "TIME_POINT_3" }
},
{
"or": [
{ "term" : { "condtion1" : true } },
{ "range" : { "end" : { "lt": 201407302300 } } }
]
},
{
"or": [
{ "term" : { "condtion9" : "GROUP_B" } },
{
"and" : [
{ "term" : { "condtion9" : "GROUP_A" } },
{ "ids" : { values: [100, 10] } }
]
}
]
}
]
}
]
}
However, I feel replacing these OR/AND/NOT filters would create a query that has too many levels and is hard to understand. For example, replacing
"or": [
....
]
I have to have:
"bool" {
"should": [
]
}
Am I right that replacing OR/AND/NOT with bool filter in my case is at the expense of sacrificing understandability?
A related question
If I have to replace OR/AND/NOT filters for performance, should I replace ALL of these OR/AND/NOT filters, or just some of them such as the one at the top for example?
Thanks and regards.
You should replace all of them except geo/script/range filters. Having said that understanding the possible impact of each filter can help you also. For example if one of the filter is going to filter out say 90% of the result then you may want to put that in an and filter at the starting. Since and/or filters are executed sequentially the rest of the filters will have lesser documents to process. In case of bool filters all the filters are combined in a single bitset operation. You might have already read about it.
I don't think you will be sacrificing understability by replacing OR/AND/NOT with bool filter. As the example you have given, for a single or filter converting to should filter looks like an increase in the query structure but in an overall combination the structure would be almost similar.
Related
This is a contrived example to illustrate my problem. I have a bunch of filename that I would like to sort alphabetically in the same way macOS might do in a finder window.
These are my indexed file names in the order I would expect to see them sorted:
A Tribe Called Quest - Can I Kick It (1).mp3
a.png
Bcc 05.png
Birling Gap Cliffs.jpg
Durdle Door.jpg
f.png
Frost.jpg
p.png
Users order.mp4
z.png
And this is what I'm doing in Kibana dev tools to test:
## sorting contrived example
# create the index with keyword filename for sorting
PUT /file-names
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"_doc" : {
"properties": {
"filename": { "type": "keyword" }
}
}
}
}
# create bunch of documents
POST file-names/_doc/_bulk
{ "index":{} }
{ "filename":"A Tribe Called Quest - Can I Kick It (1).mp3" }
{ "index":{} }
{ "filename":"a.png" }
{ "index":{} }
{ "filename":"Bcc 05.png" }
{ "index":{} }
{ "filename":"Birling Gap Cliffs.jpg" }
{ "index":{} }
{ "filename":"Durdle Door.jpg" }
{ "index":{} }
{ "filename":"Frost.jpg" }
{ "index":{} }
{ "filename":"f.png" }
{ "index":{} }
{ "filename":"Users order.mp4" }
{ "index":{} }
{ "filename":"p.png" }
{ "index":{} }
{ "filename":"z.png" }
# query with sort - bugged
GET /file-names/_search
{
"sort": {
"filename": {
"order": "asc"
}
}
}
The results I'm getting back are:
"hits" : [
{
"_index" : "file-names",..."_score" : null,
"_source" : {
"filename" : "A Tribe Called Quest - Can I Kick It (1).mp3"
},
"sort" : [
"A Tribe Called Quest - Can I Kick It (1).mp3"
]
},
{
...
"_source" : {
"filename" : "Bcc 05.png"
},
"sort" : [
"Bcc 05.png"
]
},
{
...
"_source" : {
"filename" : "Birling Gap Cliffs.jpg"
},
"sort" : [
"Birling Gap Cliffs.jpg"
]
},
{
...
"_source" : {
"filename" : "Durdle Door.jpg"
},
"sort" : [
"Durdle Door.jpg"
]
},
{
...
"_source" : {
"filename" : "Frost.jpg"
},
"sort" : [
"Frost.jpg"
]
},
{
...
"_source" : {
"filename" : "Users order.mp4"
},
"sort" : [
"Users order.mp4"
]
},
{
...
"_source" : {
"filename" : "a.png"
},
"sort" : [
"a.png"
]
},
{
...
"_source" : {
"filename" : "f.png"
},
"sort" : [
"f.png"
]
},
{
...
"_source" : {
"filename" : "p.png"
},
"sort" : [
"p.png"
]
},
{
...
"_source" : {
"filename" : "z.png"
},
"sort" : [
"z.png"
]
}
]
Which are not in the order I'd expect. You can see "a.png" is below "Users order.mp4" for reasons I cannot understand.
Any help appreciated to get sorting working in the order I'd expect!
As #Alper suggested, this has already been addressed.
If you for some reason need to stick with the keyword mapping, here's how you can script-sort:
GET /file-names/_search
{
"sort": {
"_script": {
"type": "string",
"script": {
"lang": "painless",
"source": "doc['filename'].value.toLowerCase()"
},
"order": "desc"
}
}
}
I have created a watch in Elasticsearch to alert me if the ratio of http errors is greater than 15% of total requests over 60 minutes.
I am using chain inputs to generate the dividend and divisor values for my ratio calculation.
In my condition I am using scripting to do the division and check if it is greater than my ratio.
However, whenever I use 2 ctx parameters to do the division, it always equals to zero.
If I play with it and only use one of ctx param, then it works fine.
It seems that we cannot use 2 ctx params in a condition.
Does anyone know how to get around this?
Below is my watch.
Thanks.
{
"trigger" : {
"schedule" : {
"interval" : "5m"
}
},
"input" : {
"chain":{
"inputs": [
{
"first": {
"search" : {
"request" : {
"indices" : [ "logstash-*" ],
"body" : {
"query" : {
"bool":{
"must": [
{
"match" : {"d.uri": "xxxxxxxx"}
},
{
"match" : {"topic": "xxxxxxxx"}
}
],
"filter": {
"range": {
"#timestamp": {
"gte": "now-60m"
}
}
}
}
}
}
},
"extract": ["hits.total"]
}
}
},
{
"second": {
"search" : {
"request" : {
"body" : {
"query" : {
"bool":{
"must": [
{
"match" : {"d.uri": "xxxxxxxx"}
},
{
"match" : {"topic": "xxxxxxxx"}
},
{
"match" : {"d.status": "401"}
}
],
"filter": {
"range": {
"#timestamp": {
"gte": "now-60m"
}
}
}
}
}
}
},
"extract": ["hits.total"]
}
}
}
]
}
},
"condition" : {
"script" : {
"source" : "return (ctx.payload.second.hits.total / ctx.payload.first.hits.total) == 0"
}
}
}
The issue comes in fact from the fact that I was doing an integer division to get to a ratio in the form of 0.xx.
I reversed the operation and it is working fine.
I have the following data index in Elasticsearch with the following syntax:
PUT /try1
{
"mappings" : {
"product" : {
"properties" : {
"name": { "type" : "text" },
"categories": {
"type": "nested",
"properties": {
"range":{"type":"text"}
}
}
}
}
}
}
The range type has an array of words:["high","medium","low"]
I need to access the range element inside the nested category. I tried using the following syntax:
GET /try1/product/_search
{
"query": {
"nested" : {
"path" : "categories",
"query" : {
"bool" : {
"must" : [
{ "match" : {"categories.range": "low"} }
]
}
}
}
}
}
However, I am getting an error with the message:
"reason": """failed to create query:...
Can someone please offer a solution to this?
#KGB can you try to make your query slightly differently like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"categories.range": "low"
}
}
]
}
}
}
{
"query": {
"nested" : {
"path" : "categories",
"query" : {
"bool" : {
"must" : [
{ categories.range": "low"}
]
}
}
}
}
}
This worked perfectly
Hi I want to achieve this in Elasticsearch.
select * from products where brandName = Accu or brandName = Perfor AND cat=lube(any where in any filed of an elastic search ).
I am using this query in Elasticsearch.
{
"bool": {
"must": {
"query_string": {
"query": "oil"
}
},
"should": [
{
"term": {
"BrandName": "Accu"
}
},
{
"term": {
"BrandName": "Perfor"
}
}
]
}
}
By this query m not getting the combination exact results.
You need to add minimum_should_match: 1 to your query and probably use match instead of term if your BrandName field is an analyzed string.
{
"bool" : {
"must" : {
"query_string" : {
"query" : "oil OR lube OR lubricate"
}
},
"minimum_should_match": 1, <---- add this
"should" : [ {
"match" : {
"BrandName" : "Accu"
}
}, {
"match" : {
"BrandName" : "Perfor"
}
} ]
}
}
This query satisfy your condition.
{
"bool" : {
"must" : { "term" : { "cat" : "lube" } },
"should" : [
{ "term" : { "BrandName" : "Accu" } },
{ "term" : { "BrandName" : "Perfor" } }
],
"minimum_should_match" : 1
}
}
How do you order ES terms aggregations by multiple values?
At the moment i do:
aggs : {
aggName : {
terms : {
field : "foo",
order : { "subAgg.avg" : "desc" }
}
},
aggs : {
subAgg : {
stats : {
field : "bar"
}
}
}
}
The API says you can do:
order : [ { "subAgg.avg" : "desc" }, { "subAgg.count" : "desc" } ]
But this does not work, ES throws an error:
Unknown key for a START_ARRAY in [aggName]: [order].
I found something like this in other posts:
order : { "subAgg.avg" : "desc", "subAgg.count" : "desc" }
No error, but not sorted correctly.
My question is, how to correctly sort by many values?
I have ES 1.4.4 installed.
thx
EDITED:
Mapping
{
"mappings" : {
"mymapping" : {
"properties" : {
"foo" : {
"type" : "short"
}
}
}
}
}
Query:
{
query : {
match_all : {}
},
aggs : {
aggName : {
terms : {
field : "foo",
order : [ { "subAgg.avg" : "desc" }, { "subAgg.count" : "desc" } ]
},
aggs : {
subAgg : {
stats : {
field : "foo"
}
}
}
}
}
}
You can try this:
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
From: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "artist.country",
"order" : [ { "rock>playback_stats.avg" : "desc" }, { "_count" : "desc" } ]
},
"aggs" : {
"rock" : {
"filter" : { "term" : { "genre" : "rock" }},
"aggs" : {
"playback_stats" : { "stats" : { "field" : "play_count" }}
}
}
}
}
}