Why does this query cause 'too many clauses'? - elasticsearch

I have a query with only a few 'shoulds' and 'filters', but one of the filters has a terms query with ~20,000 terms in it. Our max_terms_count is 200k but this is complaining about 'clauses'.
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=too_many_clauses, reason=too_many_clauses: maxClauseCount is set to 1024]
I've written queries containing terms queries with far more terms than this. Why is this query causing a 'too many clauses' error? How can I rewrite this query to get the same result without the error?
{
"query" : {
"bool" : {
"filter" : [
{
"nested" : {
"query" : {
"range" : {
"dateField" : {
"from" : "2019-12-03T21:34:30.653Z",
"to" : "2020-12-02T21:34:30.653Z",
"include_lower" : true,
"include_upper" : true,
"boost" : 1.0
}
}
},
"path" : "observed_feeds",
"ignore_unmapped" : false,
"score_mode" : "none",
"boost" : 1.0
}
}
],
"should" : [
{
"bool" : {
"filter" : [
{
"terms" : {
"ipAddressField" : [
"123.123.123.123",
"124.124.124.124",
... like 20,000 of these
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"minimum_should_match" : "1",
"boost" : 1.0
}
}
}
Edit: one note - The reason I'm wrapping the terms query in a should -> bool is because there are times where we need to have multiple terms queries OR'd together. This happened to not be one of them.

The reason you are facing this with terms query is because the should clause is outside filter clause and contributing to score calculation. This is the reason these terms are subject to max_clause_count. If score is not required for that part then you can rephrase you query as below:
{
"query": {
"bool": {
"filter": [
{
"nested": {
"query": {
"range": {
"dateField": {
"from": "2019-12-03T21:34:30.653Z",
"to": "2020-12-02T21:34:30.653Z",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
"path": "observed_feeds",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1
}
},
{
"bool": {
"should": [
{
"bool": {
"filter": [
{
"terms": {
"ipAddressField": [
"123.123.123.123",
"124.124.124.124",
... like 20,000 of these
],
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
]
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}

Related

Elastic Search combination of with Multiple Range, Term filters with And and Or operators

I have a filter with multiple data range filter with And and OR operators. I have to get filter results which satisfies both date range filters or any one of the date range filter.
"query":{
"bool" : {
"must" : [
{
"match_phrase_prefix" : {
"searchField" : {
"query" : "Adam",
"slop" : 0,
"max_expansions" : 50,
"boost" : 1.0
}
}
}
],
"filter" : [
{
"term" : {
"srvcType" : {
"value" : "FullTime",
"boost" : 1.0
}
}
},
{"range" : { "or": {"startDt": {"from" : "2010-05-16","to" : "2022-02-18","include_lower": true,"include_upper" : true,"boost" : 1.0}} }},
{"range" : { "or": {"endDt": {"from" : "2015-05-16","to" : "2022-02-18","include_lower" : true,"include_upper" : true,"boost" : 1.0}}}}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
I tried to run the query like above, I got parsing_exception - query does not support StartDt.
{
"query":{
"bool" : {
"must" : [
{
"match_phrase_prefix" : {
"searchField" : {
"query" : "Adam",
"slop" : 0,
"max_expansions" : 50,
"boost" : 1.0
}
}
}
],
"filter" : [
{
"term" : {
"srvcType" : {
"value" : "FullTime",
"boost" : 1.0
}
}
},
{"range" : {"startDt": {"from" : "2010-05-16","to" : "2022-02-18","include_lower": true,"include_upper" : true,"boost" : 1.0}} },
{"range" : {"endDt": {"from" : "2015-05-16","to" : "2022-02-18","include_lower" : true,"include_upper" : true,"boost" : 1.0}}}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
}
If you need AND semantics for your date range filters, you can let both range queries in the bool/filter array.
However, if you need OR semantics you can use the bool/should query, like below:
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"searchField": {
"query": "Adam",
"slop": 0,
"max_expansions": 50,
"boost": 1
}
}
}
],
"filter": [
{
"term": {
"srvcType": {
"value": "FullTime",
"boost": 1
}
}
}
],
"minimum_should_match": 1,
"should": [
{
"range": {
"startDt": {
"from": "2010-05-16",
"to": "2022-02-18",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
},
{
"range": {
"endDt": {
"from": "2015-05-16",
"to": "2022-02-18",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}

How do I combine elasticsearch filter queries with OR using the Java API

I have these fields:
country (can be "DK", "US", "UK" and so on)
media ("book", "ebook", "cd")
state ("active", "inactive")
I would like to search for all documents that have country="DK" AND ((media="book" AND state="inactive") OR (media="ebook" AND state="ACTIVE)
I am creating a BoolQueryBuilder like this:
BoolQueryBuilder bqb = QueryBuilders
.boolQuery()
.must(QueryBuilders.termsQuery("country", "DK"));
bqb.filter(QueryBuilders.termsQuery("media", "book"));
bqb.filter(QueryBuilders.termsQuery("state", "inactive");
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(bqb)
.build();
From what I can understand from this Stackoverflow question: elasticsearch bool query combine must with OR I should create a query looking like this:
{
"query": {
"bool": {
"must": [
{
"term": {"country": "DK"}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{"term": {"media": "book"}},
{"term": {"state": "inactive"}}
]
}
},
{
"bool": {
"must": [
{"term": {"media": "ebook"}},
{"term": {"state": "active"}}
]
}
}
]
}
}
]
}
}
}
Is this correct?
How do I do this with the Java API?
After some trial and error this seems to work:
BoolQueryBuilder bqb = QueryBuilders
.boolQuery()
.must(QueryBuilders.termsQuery("country", "DK"));
BoolQueryBuilder query1 = QueryBuilders.boolQuery();
query1.filter(QueryBuilders.termsQuery("media", "book"));
query1.filter(QueryBuilders.termsQuery("state", "inactive");
BoolQueryBuilder query2 = QueryBuilders.boolQuery();
query2.filter(QueryBuilders.termsQuery("media", "ebook"));
query2.filter(QueryBuilders.termsQuery("state", "active");
bqb.filter(QueryBuilders.boolQuery().should(query1).should(query2));
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(bqb)
.build();
And it generated this query:
{
"bool" : {
"must" : [
{
"terms" : {
"country" : [
"DK"
],
"boost" : 1.0
}
}
],
"filter" : [
{
"bool" : {
"should" : [
{
"bool" : {
"filter" : [
{
"terms" : {
"media" : [
"book"
],
"boost" : 1.0
}
},
{
"terms" : {
"state" : [
"inactive"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
{
"bool" : {
"filter" : [
{
"terms" : {
"media" : [
"ebook"
],
"boost" : 1.0
}
},
{
"terms" : {
"state" : [
"active"
],
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
],
"adjust_pure_negative" : true,
"boost" : 1.0
}
}

ElasticSearch: Multi_fields parameter not working

I have a multi fields parameter "startTime", below is the mapping
"startTime" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"raw" : {
"type" : "date",
"format" : "dd-MM-yyyy HH:mm:ss||dd-MM-yyyy||hour_minute_second"
}
}
}
i inserted few documents
{
"orgId" => "backendorg",
"startTime" => "01-01-1980 06:32:51"
}
{
"orgId" => "backendorg",
"startTime" => "01-01-1980 06:35:51"
}
{
"orgId" => "backendorg",
"startTime" => "01-01-1980 06:39:51"
}
when i am trying to filter startTime based on below query it is returning empty result
{
"query": {
"bool": {
"must": [
{
"term": {
"orgId": {
"value": "backendorg",
"boost": 1
}
}
},
{
"bool": {
"should": [
{
"range": {
"startTime": {
"from": "01-01-1980 06:32:51",
"to": "01-01-1980 06:39:51",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
]
}
}
]
}
}
}
could someone tell me what is wrong with my query or mapping.
Since your date field is a sub-field called startTime.raw you need to use it in your range query
{
"range": {
"startTime.raw": { <----- change this
"from": "01-01-1980 06:32:51",
"to": "01-01-1980 06:39:51",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
]

How to built AND condition between should and must elastic search bool query

Here is the sample USER document
{
"id" : "1234567",
"userId" : "testuser01",
"firstName" : "firstname",
"lastName" : "lastname",
"orgId" : "567890",
"phoneNumber" : "1234567890"
}
I want to build a search query where in I want to pull all those users which belong to particular orgId AND which matches the search text entered by user in any of the fields (userId, firstname, etc.)
ex. if search is made using text "first", I want to pull all those records which belong to particular orgId AND fields containing first in it.
Sample query I am trying is
"query" : {
"bool" : {
"must" : [
{
"term" : {
"orgId.keyword" : {
"value" : "567890",
"boost" : 1.0
}
}
}
],
"should" : [
{
"simple_query_string" : {
"query" : "first*",
"fields" : [
"lastName^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : true,
"boost" : 1.0
}
},
{
"simple_query_string" : {
"query" : "first*",
"fields" : [
"userId^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : true,
"boost" : 1.0
}
},
{
"simple_query_string" : {
"query" : "first*",
"fields" : [
"orgId^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : true,
"boost" : 1.0
}
},
{
"simple_query_string" : {
"query" : "first*",
"fields" : [
"firstName^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : true,
"boost" : 1.0
}
},
{
"simple_query_string" : {
"query" : "first*",
"fields" : [
"phoneNumber^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : true,
"boost" : 1.0
}
},
{
"simple_query_string" : {
"query" : "first*",
"fields" : [
"id^1.0"
],
"flags" : -1,
"default_operator" : "or",
"lenient" : false,
"analyze_wildcard" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"sort" : [
{
"userId.keyword" : {
"order" : "asc"
}
}
]
}
Issue I am facing is, I want to have AND condition between MUST and SHOULD.
You don't need to specify the query for each field in query_string query. Rather you can specify the list of fields as below:
{
"query": {
"bool": {
"must": [
{
"term": {
"orgId.keyword": {
"value": "567890",
"boost": 1
}
}
},
{
"simple_query_string": {
"query": "first*",
"fields": [
"lastName^1.0",
"userId^1.0",
"orgId^1.0",
"firstName^1.0",
"phoneNumber^1.0",
"id^1.0"
]
}
}
]
}
},
"sort": [
{
"userId.keyword": {
"order": "asc"
}
}
]
}
Also to answer
How to built AND condition between should and must elastic search bool query?
here is a sample query for this:
{
"query": {
"bool": {
"must": [
{
"term": {
"field1": "someval"
}
},
{
"bool": {
"should": [
{
"terms": {
"field2": [
"v1",
"v2"
]
}
},
{
"query_string": {
"query": "this AND that OR thus"
}
}
]
}
}
]
}
}
}

How does ES multiple random queries without repeating the results?

elasticsearch version 5.0
I have a requirement to randomly query user information multiple times, but the final result cannot have duplicate data.
For example,
the first random query result
user0 user1 user2
the second random query result
user0 user3 user4
User0 is a duplicate.
This is my random query, how can I modify it?
{
"size" : 10,
"query" : {
"match_all" : {
"boost" : 1.0
}
},
"_source" : {
"includes" : [
],
"excludes" : [ ]
},
"sort" : [
{
"_script" : {
"script" : {
"inline" : "Math.random()",
"lang" : "painless"
},
"type" : "number",
"order" : "asc"
}
}
],
"ext" : { }
}
{
"size": 1,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "1477072619038"
}
}
]
}
}
}
You can follow this https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-function-score-query.html#function-random
You can use a bool must_not query and an id query to remove the ids of the previously retrieved documents.
{
"query": {
"match_all": {
"boost": 1.0
},
"bool": {
"must_not": [
{
"ids": {
"values": [The set of previous Ids]
}
}
]
}
},
...
}

Resources