Elasticsearch should query without computing relevance (_score) - performance

I'm creating filtering queries which operates on two fields. I would like to avoid computing relevance by Elasticsearch. How to achieve OR statement without moving to query context.
My simplified model has two boolean fields:
{
is_opened,
is_send
}
I'd like to prepare query with logic:
(is_opened == true AND is_send == true) OR (is_opened == false)
In other words I want to exclude documents with fields:
is_opened == true AND is_send == false
My query looks like that:
GET documents/default/_search
{
"query": {
"bool": {
"should": [
{
"bool":{
"must":[
{"term": {"is_opened":true}},
{"term": {"is_send":true}}
]
}
},
{
"bool":{
"must":[
{"term": {"is_opened":false}}
]
}
}
]
}
}
}
Logically it works as I expected but Elasticsearch computes relevance.
I don't need it because at the end I sort results by another field so it's a place to optimize queries.
I ask about it because Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
My results have _score field computed so I think that above query is executed in query context so Elasticsearch won't cache it automatically.
In the future I would like to create queries which operates on status fields, where logic would be more complicated. Still I need to know how to block computing _score.
I noticed that changing should to filter block computing _score but works as must operator. Is it possible to change filter behavior?
Is it possible to use another query than should?
How to force Elasticserach to stop computing _score?

Simply wrap your query inside the constant_score query:
GET documents/default/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"is_opened": true
}
},
{
"term": {
"is_send": true
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"is_opened": false
}
}
]
}
}
]
}
}
}
}
}

Related

Elasticsearch : filter results based on the date range

I'm using Elasticsearch 6.6, trying to extract multiple results/records based on multiple values (email_address) passed to the query (Bool) on a date range. For ex: I want to extract information about few employees based on their email_address (annie#test.com, charles#test.com, heman#test.com) and from the period i.e project_date (2019-01-01).
I did use should expression but unfortunately it's pulling all the records from elasticsearch based on the date range i.e. it's even pulling other employees information from project_date 2019-01-01.
{
"query": {
"bool": {
"should": [
{ "match": { "email_address": "annie#test.com" }},
{ "match": { "email_address": "chalavadi#test.com" }}
],
"filter": [
{ "range": { "project_date": { "gte": "2019-08-01" }}}
]
}
}
}
I also tried must expression but getting no result. Could you please help me on finding employees using their email_address with the date range?
Thanks in advance.
Should(Or) clauses are optional
Quoting from this article.
"In a query, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document."
So in your query should is only influencing the score and not actually filtering the document. You must wrap should in must, or move it in filter(if scoring not required).
GET employeeindex/_search
{
"query": {
"bool": {
"filter": {
"range": {
"projectdate": {
"gte": "2019-01-01"
}
}
},
"must": [
{
"bool": {
"should": [
{
"term": {
"email.raw": "abc#text.com"
}
},
{
"term": {
"email.raw": "efg#text.com"
}
}
]
}
}
]
}
}
}
You can also replace should clause with terms clause as in #AlwaysSunny's answer.
You can do it with terms and range along with your existing query inside filter in more shorter way. Your existing query doesn't work as expected because of should clause, it makes your filter weaker. Read more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
{
"query": {
"bool": {
"filter": [
{
"terms": {
"email_address.keyword": [
"annie#test.com", "chalavedi#test.com"
]
}
},
{
"range": {
"project_date": {
"gte": "2019-08-01"
}
}
}
]
}
}
}

ElasticSearch query with MUST and SHOULD

I have this query to get data from AWS elasticSearch instance v6.2
{
"query": {
"bool": {
"must": [
{
"term": {"logLevel": "error"}
},
{
"bool": {
"should": [
{
"match": {"EventCategory": "Home Management"}
}
]
}
}
],
"filter": [{
"range": { "timestamp": { "gte": 155254550880 }}
}
]
}
},
"size": 10,
"from": 0
}
My data has multiple EventCategories for example 'Home Management' and 'User Account Management'. Problem with this is inside should having match returns all data because phrase 'Management' is in both categories. If I use term instead of match, it don't returns anything at all even when the given value is exactly same as in document.
I need to get data when any of given category is matched with rest of filters.
EDIT:
There may none, one or more than one EventCategory be passed to should clause
I'm not sure why you added a should within a must. Do you expect to have more than one should cases? It looks a bit odd.
As for your question, you can't use the term query on an analysed field, but only on keyword typed fields. If your EventCategory field has the default mapping, you can run the term query against the default non-analysed multi-field of EventCategory as follows:
...
{
"term": { "EventCategory.keyword": "Home Management" }
}
...
Furthermore, if you just want to filter in/out documents without caring about their relevance, I'd recommend you to move all the conditions in the filter block, to speed-up your query and make a better use of the cache.
Below query should work.
I've just removed should and created two must clauses one for each of event and management. Note that the query is meant for text datatypes.
{
"query":{
"bool":{
"must":[
{
"term":{
"logLevel":"error"
}
},
{
"match":{
"EventCategory":"home"
}
},
{
"match":{
"EventCategory":"management"
}
}
],
"filter":[
{
"range":{
"timestamp":{
"gte":155254550880
}
}
}
]
}
},
"size":10,
"from":0
}
Hope it helps!

Elasticsearch query to match two different fields with exact values

I want to find the record in my elasticsearch index where it should match field "connectorSpecific.hostname.keyword" with value "tyco-fire.com" and field "hasForms" with value true.
Below is my elasticsearch query:
GET index1/_search
{
"query": {
"bool": {
"should": [
{ "match": { "connectorSpecific.hostname.keyword": "tyco-fire.com" }},
{ "match": { "hasForms": true }}
]
}
}
}
This query is returning records which also has field "hasForms" with value false. Not sure why.I am using a boolean should query.Any help is appreciated
If you want both constraints to match, then you should use bool/filter (or bool/must would work as well but since you're doing exact matching, you don't need scoring at all), like this:
GET index1/_search
{
"query": {
"bool": {
"filter": [
{ "match": { "connectorSpecific.hostname.keyword": "tyco-fire.com" }},
{ "match": { "hasForms": true }}
]
}
}
}

Elastic Search Filter performing much slower than Query

As my ES index/cluster has scaled up (# ~2 billion docs now), I have noticed more significant performance loss. So I started messing around with my queries to see if I could squeeze some perf out of them.
As I did this, I noticed that when I used a Boolean Query in my Filter, my results would take about 3.5-4 seconds to come back. But if I do the same thing in my Query it is more like 10-20ms
Here are the 2 queries:
Using a filter
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[{"match_all":{}}]}},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
Using a query
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]}}
}
Like I said, the second method where I don't use a Filter at all takes mere milliseconds, while the first query takes almost 4 seconds. This seems completely backwards from what the documentation says. They say that the Filter should actually be very quick and the Query should be the one that takes longer. So why am I seeing the exact opposite here?
Could it be something with my index mapping? If anyone has any idea why this is happening I would love to hear suggestions.
Thanks
The root filter element is actually another name for post_filter element. Somehow, it was supposed to be removed (the filter) in ES 1.1 but it slipped through and exists in 2.x versions as well.
It is removed completely in ES 5 though.
So, your first query is not a "filter" query. It's a query whose results are used afterwards (if applicable) in aggregations, and then the post_filter/filter is applied on the results. So you basically have a two steps process in there: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-request-post-filter.html
More about its performance here:
While we have gained cacheability of the tag filter, we have potentially increased the cost of scoring significantly. Post filters are useful when you need aggregations to be unfiltered, but hits to be filtered. You should not be using post_filter (or its deprecated top-level synonym filter) if you do not have facets or aggregations.
A proper filter query is the following:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [],
"must": [
{
"match_all": {}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
}
}
A filter is faster. Your problem is that you include the match_all query in your filter case. This matches on all 2 billion of your documents. A set operation has to then be done against the filter to cull the set. Omit the query portion in your filter test and you'll see that the results are much faster.

elastic search where clause with constant rank?How to do this?

I'm new to elastic search. How to generate elastic search equivalent query for
select * from response where pnrno='sampleid'
I know we have to use 'filter' option in elastic search.but we do not need any ranking. (ranking can be constant) so how can I generate query for achieve this
you are correct , you can use filtered query with query clause empty and filters.Filtering a set of documents is to filter the sets upon which query acts to furthur filter/match and calculate relevance.Filters are like bool either match or reject(1/0).
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"term": {
"FIELD": "VALUE"
}
}]
}
}
}
}
}
The usual way of achieving this is by using the constant_score query with an embedded term filter, like this:
{
"query": {
"constant_score": {
"filter": {
"term": {
"pnrno": "sampleid"
}
}
}
}
}

Resources