Elasticsearch bool query join order - elasticsearch

Raising this question to know the order in which ES executes query clauses (must, should, filter, must_not) that are part of bool query. Sharing the sample query from ES docs -
{ "query": {
"bool" : {
"must" : {
"term" : { "user.id" : "kimchy" }
},
"filter": {
"term" : { "tags" : "production" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tags" : "env1" } },
{ "term" : { "tags" : "deployed" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
} } }
From the documentation it looks like query-clauses are joined using AND condition. For example, above search DSL's SQL counterpart would look like (rough translation) -
select * from user where user_id like 'kimchy' and tags in ('production') and not (10 <= range <= 20) and tags in ('env1', 'deployed');
I actually wasn't able to find official documentation around this, but did see some texts that ES query-evaluation heavily depends on certain cost approximations. Wondering how to map the ordering to SQL like syntax so, we can develop a clear mental picture when authoring ES queries. It also feels like ordering might have some affect for deeply nested boolean AND OR queries.

Related

Applying increasingly slow query filters depending on the number of matches

Is there a way of building a ES query so that it doesn't apply slower parts like wildcard searches or including more fields... If the number of results with the previous conditions already reaches the specified query size?
I assume putting aside totalHits value.
I have tried playing with the boosting setting but ES expectedly applies all the combinations.
{
"size" : 5,
"query": {
"bool": {
"should" : [
{ "term" : { "search.autocomplete" : { "value" : "120", "boost" : 20 } }},
{ "term" : { "search.autocomplete_inverse" : { "value" : "120", "boost" : 15 } }},
{ "match" : { "search.keyword" : { "query" : "120", "boost" : 10 } }},
{ "wildcard" : { "brand.search" : { "value" : "*120*", "boost": 5}}},
{ "wildcard" : { "category.search" : { "value" : "*120*", "boost": 0}}}
]
}
}
}
A way so that if the first condition matches with 5 or more docs ES doesn't spend more time trying to find more matches.
A different approach would be to execute multiple queries in my application until I reach the desired amount of results, but it doesn't feel right...

How to query on multiple fields in elasticsearch?

i have tried the multiple field query and it works fine. But I would like to know what other options are generally used to query multiple fields in elasticsearch?
Structured queries with multiple terms, for finding exact values, the same as SQL
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_multiple_exact_values.html
"bool" : {
"must" : [
{ "term" : { "tags" : "search" } },
{ "term" : { "tag_count" : 1 } }
]
}
For example, consider following sql query,
SELECT product
FROM products
WHERE (price = 20 OR productID = "XHDK-A-1293-#fJ3")
AND (price != 30)
In these situations, you will need the bool filter. This is a compound filter that accepts other filters as arguments, combining them in various Boolean combinations.
The Query DSL would be,
GET /my_store/products/_search
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
}
}
}
}
}
}
Follow the below link for documentation
https://www.elastic.co/guide/en/elasticsearch/guide/current/combining-filters.html

ElasticSearch : IN equivalent operator in ElasticSearch

I am trying to find ElasticSearch query equivalent to IN \ NOT in SQL.
I know we can use QueryString query with multiple OR to get the same answer, but that ends up with lot of OR's.
Can anyone share the example?
Similar to what Chris suggested as a comment, the analogous replacement for IN is the terms filter (queries imply scoring, which may improve the returned order).
SELECT * FROM table WHERE id IN (1, 2, 3);
The equivalent Elasticsearch 1.x filter would be:
{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"id" : [1, 2, 3]
}
}
}
}
}
The equivalent Elasticsearch 2.x+ filter would be:
{
"query" : {
"bool" : {
"filter" : {
"terms" : {
"id" : [1, 2, 3]
}
}
}
}
}
The important takeaway is that the terms filter (and query for that matter) work on exact matches. It is implicitly an or operation, similar to IN.
If you wanted to invert it, you could use the not filter, but I would suggest using the slightly more verbose bool/must_not filter (to get in the habit of also using bool/must and bool).
{
"query" : {
"bool" : {
"must_not" : {
"terms" : {
"id" : [1, 2, 3]
}
}
}
}
}
Overall, the bool compound query syntax is one of the most important filters in Elasticsearch, as are the term (singular) and terms filters (plural, as shown).
1 terms
you can use terms term query in ElasticSearch that will act as IN
terms query is used to check if the value matches any of the provided values from Array.
2 must_not
must_not can be used as NOT in ElasticSearch.
ex.
GET my_index/my_type/_search
{
"query" : {
"bool" : {
"must":[
{
"terms": {
"id" : ["1234","12345","123456"]
}
},
{
"bool" : {
"must_not" : [
{
"match":{
"id" : "123"
}
}
]
}
}
]
}
}
}
exists
Also if it helps you can also use "exists" query to check if the field exists or not.
for ex,
check if the field exists
"exists" : {
"field" : "mobileNumber"
}
check if a field does not exist
"bool":{
"must_not" : [
{
"exists" : {
"field" : "mobileNumber"
}
}
]
}
I saw what you requested.
And I wrote the source code as below.
I hope this helps you solve your problem.
sql query :
select * from tablename where fieldname in ('AA','BB');
elastic search :
{
query :{
bool:{
must:[{
"script": {
"script":{
"inline": "(doc['fieldname'].value.toString().substring(0,2).toUpperCase() in ['AA','BB']) == true"
}
}
}],
should:[],
must_not:[]
}
}
}

Elastic(search): How to structure nested queries correctly?

I'm currently quite confuse about the structuring of queries in elastic. Let me explain what I mean with the following template that works fine for me:
{
"template" : {
"query" : {
"filtered" : {
"query" : {
"bool" : {
"must" : [
{ "match" : {
"user" : "{{param_user}}"
} },
{ "match" : {
"session" : "{{param_session}}"
} },
{ "range" : {
"date" : {
"gte" : "{{param_from}}",
"lte" : "{{param_to}}"
}
} }
]
}
}
}
}
}
}
Ok so I want to get entries of a specific session of a user in a certain time period. Now if you take a llok at this link http://www.elastic.co/guide/en/elasticsearch/guide/current/combining-filters.html you can find the following query:
{
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"should" : [
{ "term" : {"price" : 20}},
{ "term" : {"productID" : "XHDK-A-1293-#fJ3"}}
],
"must_not" : {
"term" : {"price" : 30}
}
}
}
}
}
}
In this example we have right after the "filtered" the "filter" keyword. However if I exchange my second "query" with a "filter" as in the example , my template won't work anymore. This is really counterintuitive and I payed alot of time to figure this out. A̶l̶s̶o̶ ̶I̶ ̶d̶o̶n̶'̶t̶ ̶u̶n̶d̶e̶r̶s̶t̶a̶n̶d̶ ̶w̶h̶y̶ ̶w̶e̶ ̶n̶e̶e̶d̶ ̶t̶o̶ ̶p̶u̶t̶ ̶e̶v̶e̶r̶y̶ ̶f̶i̶l̶t̶e̶r̶ ̶i̶n̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶ ̶̶{̶ ̶}̶̶ ̶e̶v̶e̶n̶ ̶t̶h̶o̶u̶g̶h̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶ ̶a̶l̶r̶e̶a̶d̶y̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶d̶ ̶b̶y̶ ̶t̶h̶e̶ ̶a̶r̶r̶a̶y̶ ̶s̶y̶n̶t̶a̶x̶.̶
Another issue I had was that I suggested to match several fields I can just type smth like:
{
"query" : {
"match" : {
"user" : "{{param_user}}",
"session" : "{{param_session}}"
}
}
}
but it seemed that I have to use a bool query which I didn't know of, so I searched for 'elastic multi match' but got something completely different.
My question: where can I find how to structure a query properly (smth like a PEG)? The documentation only give basic examples but doesn't state what we can actually do and how.
Best regards,
Jan
Edit: Ok I just found by accident that I cannot exchange "query" with "filter" as "match" is a query and not a filter. But then again what about "range"? It seems to be a query as well as a filter... Is there a summary of keywords specifying in which context they can be used?
Is there a summary of keywords specifying in which context they can be used?
I wouldn't consider that as keywords. It's just there are both queries and filters with the same names (but not all of them).
Here is everything you need. For example there are both range query and filter. All you need is to understand the difference between filters and queries.
For example, if you want to move range section from query to filter, you can do that like shown in the code below (not tested). Since your code already contains filtered type of query, you can just create filter section right after query section.
{
"template": {
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"match": {
"user": "{{param_user}}"
}
},
{
"match": {
"session": "{{param_session}}"
}
}
]
}
},
"filter": {
"range": {
"date": {
"gte": "{{param_from}}",
"lte": "{{param_to}}"
}
}
}
}
}
}
}
Just remember that you can filter only not analyzed fields.

ElasticSearch using wildcard and term queries

I'm new using Elastic Search, and i never used Lucene too.
I build this query:
{
"query" : {
"wildcard" : { "referer" : "*.domain.com*" }
},
"filter" : {
"query" : {
"term" : { "first" : "1" }
}
},
"facets" : {
"site_id" : {
"terms" : {
"field" : "site",
"size" : "70"
}
}
}
}
The wildcard is working great, but the term filter was ignored, what i did wrong?
I need to filter the results with both wildcard and term
Thanks!
Assuming what you are trying to do is applying the filter on the wildcard query results,
you can use a FilteredQuery. However, your case might fit better for a filter.
You use a query filter. Instead of that you may directly use a TermFilter in a FilteredQuery rather than making a filter out of a TermQuery. TermFilter should be faster as it directly uses the TermsEnum.
Note that results of Filters are cached in a FilterCache and Filters are faster because they do not do any scoring of documents. In your case, even though the filter part of the FilteredQuery will work fast, but the wildcard query will be unnecessarily do scoring. You may try to use an AND Filter to club both queryfilter(wildcard query) and term filter instead of a FilteredQuery.
To make just the filter work as required by you, try something like below. (Not tried myself)
{
"filtered" : {
"query" : {
"wildcard" : { "referer" : "*.domain.com*" }
},
"filter" : {
"term" : { "first" : "1" }
}
},
"facets" : {
"site_id" : {
"terms" : {
"field" : "site",
"size" : "70"
}
}
}
}

Resources