How to nest bool queries? - elasticsearch

I am building a search query which dynamically adds a set of constraints (bool) to the query. The general expected structure is as follows
OR (
AND (
condition
condition
...
)
AND (
condition
condition
...
)
)
In other words I have a set (one or more) of conditions which must all be met (AND above). There may be several of such sets, any of them should be enough for the final match (the OR above).
An example of such structure, as generated by my code (this is the full API query, the generated part is "bool"):
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.118"
}
}
]
}
}
]
},
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
My understanding of this query was:
if "attack_ip === 10.89.7.117" and "sentinel_port === 17"
or
if "attack_ip === 10.89.7.118"
the entry will match
Unfortunately I get upon calling Elasticsearch the error
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 1,
"col": 177
}
],
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 1,
"col": 177
},
"status": 400
}
What does this error mean?
EDIT
Following Piotr's answer, I tried to move the range constraint into the boolean part. I get the same error, though.
My query is available online for easier reading and reproduced below:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.118"
}
}
]
}
}
]
}
},
{
"range": {
"eventtime": {
"gte": "2018-03-05T13:55:27.927+01:00"
}
}
}
]
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
}

I think the problem you have is with range part. Try to move it inside the bool:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"term": {
"attack_ip": "10.89.7.118"
}
}
],
"must": {
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
or move it to filter section:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"term": {
"attack_ip": "10.89.7.118"
}
}
],
"filter": {
"bool": {
"must": [{
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
}]
}
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
I hope I formatted this correctly. Please let me know if you have any issues.
In the end, it is possible that you will need to specify minimum_should_match param for bool query to get correct results.

Related

Elasticsearch Composite aggregation with pagination

I am using this query to fetch aggregated result but because the result matching the query criteria is very large the number of buckets is larger than 10000.
How should I write/modify this query so that I can paginate the result?
I have read that bucket aggregation doesn't allow pagination but it can be converted into composite aggregation to support pagination of response.
Any alternate way to paginate the response would also be helpful.
{
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "2022-04-26T00:00:00.000Z",
"lte": "2022-04-26T23:59:59.999Z"
}
}
},
{
"terms": {
"job.keyword": [
"JOB_1",
"JOB_2",
"JOB_3"
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "tags",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"tags.name.keyword": "jobType"
}
},
{
"term": {
"tags.value.keyword": "discrete"
}
}
]
}
}
]
}
}
}
}
]
}
}
]
}
},
"size": 0,
"aggs": {
"job": {
"terms": {
"field": "job.keyword"
},
"aggs": {
"accountId": {
"terms": {
"field": "accountId.keyword",
"size": 10000
},
"aggs": {
"accountUsageStats": {
"stats": {
"field": "count"
}
},
"tags": {
"top_hits": {
"size": 1,
"_source": {
"include": [
"tags"
]
}
}
}
}
}
}
}
}
}

How to use Aggregation by range time and terms

Idea: Search Top events on specific range and order by start_time. Like:
{
"from": 0,
"size": 7,
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"and": [
{ "bool": { "must_not": { "term": { "status": "OK" } } } },
{ "bool": { "must": { "term": { "is_blocked": false } } } }, {
"range": {
"start_time": {
"gte": "2016-01-01",
"lte": "2016-03-01"
}
}
}, {
"bool": {
"must": {
"geo_distance": {
"distance": "150km",
"coordinates": "xx.xxx, zz.zz "
}
}
}
}
]
}
}
},
"sort": [{ "start_time": "asc" },
{ "attending": "desc" }
]
}
I quite new on this concept of aggregations so still with basic problems to understand
I wanna 7 results of top events for the next 2 months. So I have two attributes to look. The max of people attending(attendings) is the definition of Top, but also I wanna order this by time(start_time: asc)
What I start to wrote but is wrong:
{
"aggs": {
"aggs": {
"event_interval": {
"date_histogram": {
"field": "start_time",
"interval": "2M",
"format": "dateOptionalTime"
}
},
"max_attending": { "max": { "field": "attending" } },
"_source": {
"include": [
"name"
]
}
}
}
}
I'm not sure you need to be using an aggregation to get what you are looking for, I think that a simple query can yield the results you would like to see, try this:
{
"size": 7,
"sort": {
"attending": {
"order": "desc"
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"start_time": {
"gte": "now-2M",
"lte": "now"
}
}
}
]
}
}
}

Filter OUT matching documents in elasticsearch with aggregation

I'm attempting to query statistics about documents in elasticsearch with the following query. The problem is that I'm trying to ignore documents with certain values for the field logger, but I can't figure out how. The query below selects all the right documents into the set, but it doesn't exclude documents with the undesirable values.
Any suggestions very welcome.
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
}
},
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
},
"must": {
"exists": {
"field": "count"
}
}
}
},
"aggs": {
"keys": {
"filter": {
"bool": {
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
}
},
"terms": {
"field": "logger"
},
"aggs": {
"hostnames": {
"terms": {
"field": "hostname"
},
"aggs": {
"pids": {
"terms": {
"field": "pid"
},
"aggs": {
"time_stats": {
"stats": {
"field": "timestamp"
}
},
"count_stats": {
"stats": {
"field": "count"
}
}
}
}
}
}
}
}
},
"size": 0
}
This should work for you as I removed filter and terms from the same level of aggregation.
{
"query": {
"bool": {
"filter": {
"not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
},
"must": {
"exists": {
"field": "count"
}
}
}
},
"aggs": {
"keys": {
"terms": {
"field": "logger"
}
}
},
"size": 0
}

ElasticSearch Query with function score is running more than 10 times slower

Here is my query without function score:
{
"from": 200,
"size": 25,
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"query": {
"terms": {
"cotypes.id": [
199
]
}
},
"path": "cotypes"
}
},
{
"range": {
"relevance": {
"from": 6,
"to": null,
"include_lower": true,
"include_upper": true
}
}
}
],
"must_not": {
"terms": {
"ontologyId": [
1314696,
1314691
]
}
}
}
},
"must": {
"match": {
"name.nameStandard": {
"query": "john smith",
"type": "boolean",
"boost": 10
}
}
}
}
}
}
This query return the response in ~250ms.
But I need to add some boost factor for improve the default scoring. I modified the query to use the function score, but after that query taking too long (~3000ms)
Here is the function score query:
{
"from": 200,
"size": 25,
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"nested": {
"query": {
"terms": {
"cotypes.id": [
199
]
}
},
"path": "cotypes"
}
},
{
"range": {
"relevance": {
"from": 6,
"to": null,
"include_lower": true,
"include_upper": true
}
}
}
],
"must_not": {
"terms": {
"ontologyId": [
1314696,
1314691
]
}
}
}
},
"must": {
"match": {
"name.nameStandard": {
"query": "john smith",
"type": "boolean",
"boost": 10
}
}
}
}
},
"functions": [
{
"script_score": {
"script": {
"file": "calculate-score",
"lang": "groovy",
"params": {
"relevance_boost": 0.5
}
}
}
}
],
"boost_mode": "sum"
}
}
}
calculate-score.groovy script given below:
def penalize = 1
def penalizeClassDict = [
'226': 0.25,
'14106': 0.25,
'656': 0.25
]
for (item in _source.cotypes)
{
if(penalizeClassDict.containsKey(item.id.toString()))
penalize = penalize * penalizeClassDict[ item.id.toString()]
}
_score + (pow(_source.relevance, relevance_boost)) * 1
Please help me to make the query perform better!
Thank you in advance!

Elasticsearch, combining nested filter with normal filter

I figured out how to map and filter on nested queries in Elasticsearch. Yay! But what isn't working out yet is to filter on both a 'normal' filter and a nested filter. The example you see here doesnt give an error and the second (nested) filter seems to be working, but the first one isn't. In this example I want both filters to be included, not just one. What am I doing wrong?
{
"size": 100,
"sort": [],
"query": {
"filtered": {
"query": {
"match_all": []
},
"filter": {
"bool": {
"must": [
{
"terms": {
"category.untouched": [
"Chargers"
]
}
}
],
"should": [],
"must_not": {
"missing": {
"field": "model"
}
}
}
},
"filter": {
"nested": {
"path":"phones",
"filter":{
"bool": {
"must": [
{
"term": {
"phones.name.untouched":"Galaxy S3 Neo I9301"
}
}
]
}
}
}
},
"strategy": "query_first"
}
},
"aggs": {
"category.untouched": {
"terms": {
"field": "category.untouched"
}
},
"brand.untouched": {
"terms": {
"field": "brand.untouched"
}
},
"price_seperate": {
"histogram": {
"field": "price_seperate",
"interval": 10,
"min_doc_count": 1
}
},
"phones.name.untouched": {
"nested": {
"path": "phones"
},
"aggs": {
"phones.name.untouched": {
"terms": {
"field": "phones.name.untouched"
}
}
}
}
}
}
You have two keys with the name "filter" (in "filtered"), so one of them is going to get ignored. You probably just need to wrap your two filters in a "bool" (bools can be nested as needed).
I can't test it without setting up some test data, but try this and see if it gets you closer:
{
"size": 100,
"sort": [],
"query": {
"filtered": {
"query": {
"match_all": []
},
"filter": {
"bool": {
"must": [
{
"terms": {
"category.untouched": [
"Chargers"
]
}
},
{
"nested": {
"path": "phones",
"filter": {
"term": {
"phones.name.untouched": "Galaxy S3 Neo I9301"
}
}
}
}
],
"should": [],
"must_not": {
"missing": {
"field": "model"
}
}
}
},
"strategy": "query_first"
}
},
"aggs": {
"category.untouched": {
"terms": {
"field": "category.untouched"
}
},
"brand.untouched": {
"terms": {
"field": "brand.untouched"
}
},
"price_seperate": {
"histogram": {
"field": "price_seperate",
"interval": 10,
"min_doc_count": 1
}
},
"phones.name.untouched": {
"nested": {
"path": "phones"
},
"aggs": {
"phones.name.untouched": {
"terms": {
"field": "phones.name.untouched"
}
}
}
}
}
}

Resources