In Elasticsearch, how do I combine multiple filters with OR without affecting the score? - elasticsearch

In Elasticsearch, I want to filter my results with two different clauses aggregated with OR e.g. return documents with PropertyA=true OR PropertyB=true.
I've been trying to do this using a bool query. My base query is just a text search in must. If I put both clauses in the filter occurrence type, it aggregates them with an AND. If I put both clauses in the should occurrence type with minimum_should_match set to 1, then I get the right results. But then, documents matching both conditions get a higher score because "should" runs in a query context.
How do I filter to only documents matching either of two conditions, without increasing the score of documents matching both conditions?
Thanks in advance

You need to leverage the constant_score query, so everything runs in the filter context:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"PropertyA": true
}
},
{
"term": {
"PropertyB": true
}
}
]
}
}
}
}
}

Related

What is the need for "bool" in queries with single must/should/must_not?

In Elasticsearch I can use boolean combination with Single must/should, is there any meaning to that?
This example with only one must inside bool works
GET /logstash-2021.02.25/_search
{
"query":
{
"bool":
{
"must":
[{
"match":
{
"level": "Error"
}
}]
}
}
}
In this other example without the "bool" (since there is only one must) it doesn't work
GET /logstash-2021.02.25/_search
{
"query":
{
"must":
[{
"match":
{
"level": "Error"
}
}]
}
}
ES DSL Query with bool means, Query that matches documents matching boolean combinations of other queries. The bool query maps to the underlying Elasticsearch layer of Lucene's Boolean query which makes the ES more powerful and delightful. It is built using one or more boolean clauses, each clause with a typed occurrence.
Elastic search bool query is a widely used query when it comes to a complex combination of filters. furthermore, ElasticSearch is able to run all the complex queries together in real-time and locate the most suitable results and return them to the user in a very short amount of time using the bool query

How do nested bool queries work ? Is there a single wrapper that can disable nesting on all the nested levels?

If you wrap bool query in constant Score Query does it calculate score for internal queries. Is there another easy way to disable scoring?
Hi I have an update so I have a query where no scoring is required, so I wrote it in two forms and did load testing with 10000 documents.
Following are the two structures of query with which I did load testing:
{
"query": {
"bool": {
"filter": [
{
"must": {
"bool": {
"must": [
{
bool:.......
}
]
}
}
}
]
}
}
}
And the second one is:
{
"query": {
"bool": {
"filter": [
{
"filter": {
"bool": {
"filter": [
{
bool:.......
}
]
}
}
}
]
}
}
}
What I found was that the first query took almost as double time as the second query. I would like to know why this happened?
Also, do the internal bool queries inside filter in first example run in query context or filter context?
I have read elastic search documentation and cannot find references or details on how it works internally.
Thanks, in advance!!
Query can have two type of context in elastic search. Query context and filter context. Query context tells how well a document matches the query i.e. it calculates score whereas filter context tells whether a document matches the query and no scoring is done.
To answer your question, if you don't want scoring for bool query simply put it in filter context. More info on query context can be found here
Caching is probably why. See the documentation: "Frequently used filters will be cached automatically by Elasticsearch, to speed up performance."
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html

Elasticsearch join-like query on multiple types and different fields

I have an Elasticsearch index called my_index which contains documents of two types, Type1 and Type2.
The two document types contain different data about the same type of entity.
The two document types both contain the ID of the related entity.
I've been trying to construct a join-like query which would return entities which match conditions on both document types, but I can't get it to work, and I also can't find any citation in the Elasticsearch multi-type or query documentation that says it's not possible.
The problem I'm trying to solve is avoiding having to manually join two result sets by getting all Type1 hits and all Type2 hits and doing the join outside of Elasticsearch, since the index has millions of documents.
The equivalent in SQL would be
select * from
Type1 inner join Type2
on Type2.EntityId = Type1.EntityId
where
Type1.Field = Condition AND
Type2.Field = Condition [...]
The URL I'm using to query against is http://elastic/my_index/Type1,Type2/_search to include both document types.
If I perform a blank query against this URL, I get hits of both Type1 and Type2.
If I add a criterion for Type1, it works as expected:
{ "query": {
"bool": {
"must": [{
"term": {
"FieldOnType1": "lorem" } } ] } } }
Somehow Elasticsearch can infer that FieldOnType1 is indeed a field on Type1.
When I add a criterion for Type2, I don't get any hits:
{ "query": {
"bool": {
"must": [{
"term": {
"FieldOnType1": "lorem" } }, {
"term": {
"FieldOnType2": "ipsum" } } ] } } }
In reality, there are sometimes more than 2 term queries, or range queries and term queries.
I'm guessing the problem with the above query is that no single document can match both criteria at once.
I've tried
using should instead of must, and I've tried
qualifying the field names with type names, and I've tried
many variations of the query (including using filters instead of queries)
but everything gives me 0 hits.
Similar questions here suggest to use the Elasticsearch multi-search API instead of the search API, but that won't solve my "manual join" problem.
Is there a way to make an elaborate "OR" query that would allow queries on both types? Or something else?
Try multi_match query (I use ES 6, so have index p/type):
GET index1,index2/_search
{
"query":{
"multi_match": {
"query": "1",
"fields": ["FieldOnType1", "FieldOnType2"]
}
}
}
If you need to use different fields, should should work:
GET test,test1/_search
{
"query":{
"bool": {
"should": [
{
"term": {"firstName": "john"}
},
{
"term": {"firstName1": "jerry1"}
}
]
}
}
}

How to specify the execution order of filter and query in an Elasticsearch query

Consider the following query in Elasticsearch:
GET nyc_visionzero/_search
{
"query": {
"bool": {
"must": [{
"fuzzy": {
"on_street_name": "AVENUE"
}
}
],
"filter": {
"term": {
"borough": "MANHATTAN"
}
}
}
}
}
Is the filter part executed first and then fuzzy or its the other way around? What if I want to change the order of their execution! How can I do that?
This question relates to the query vs. filter context topic. Everything in the query context (here query.bool.must) counts to the score of a document whereas the conditions in the filter context (query.filter) are a yes/no decision.
So from a performance perspective, filters are faster and can be cached. On the other side queries allow for some fuzziness.
There is a much more detailed explanation on this in the elasticsearch docs on query and filter context.

How to use multifield search in elasticsearch combining should and must clause

This may be a repeted question but I'm not findin' a good solution.
I'm trying to search elasticsearch in order to get documents that contains:
- "event":"myevent1"
- "event":"myevent2"
- "event":"myevent3"
the documents must not contain all of them in the same document but the result should contain only documents that are only with those types of events.
And this is simple because elasticsearch helps me with the clause should
which returns exactly what i want.
But then, I want that all the documents must contain another condition that is I want the field result.example.example = 200 and this must be in every single document PLUS the document should be 1 of the previously described "event".
So, for example, a document has "event":"myevent1" and result.example.example = 200 another one has "event":"myevent2" and result.example.example = 200 etc etc.
I've tried this configuration:
{
"query": {
"bool": {
"must":{"match":{"operation.result.http_status":200}},
"should": [
{
"match": {
"event": "bank.account.patch"
}
},
{
"match": {
"event": "bank.account.add"
}
},
{
"match": {
"event": "bank.user.patch"
}
}
]
}
}
}
but is not working 'cause I also get documents that not contain 1 of the should field.
Hope I explained well,
Thanks in advance!
As is, your query tells ES to look for documents that must have "operation.result.http_status":200 and to boost those that have a matching event type.
You're looking to combine two must queries
one that matches one of your event types,
one for your other condition
The event clause accepts multiple values and those values are exact matches : you're looking for a terms query.
Try
{
"query": {
"bool": {
"must": [
{"match":{"operation.result.http_status":200}},
{
"terms" : {
"event" : [
"bank.account.patch",
"bank.account.add",
"bank.user.patch"
]
}
}
]
}
}
}

Resources