Elasticsearch "boost" not working when inside "filter" - elasticsearch

I'm trying to boost matches on a certain field over another.
This works fine:
{
"query": {
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
}
When i see the documents matched on mainField, i see they have a _score of 2.0 as expected.
But when i wrap this same query in a filter:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
]
}
}
}
The _score for all documents is 0.0.
The same thing happens for multi_match. By itself (e.g inside a query) it works fine, but inside a bool + filter, it doesn't work.
Can someone explain why this is the case? I need to wrap in a filter due to the way my app composes queries.
Some context might also help: I'm trying to return documents that match on either mainField or otherField, but sort the ones matching on mainField first, so i figured boost would be the most appropriate choice here. But let me know if there is a better way.

The filter queries are always executed in the filter context. It will always return a score of zero and only contribute to the filtering of documents.
Refer to this documentation, to know more about filter context
Due to this, you are not getting a _score of 2.0, even after applying boost, in the second query

Related

Elastic Search 7.9: exact on multiple fields

I am using the latest version of Elastic Search (7.9) and i'm trying to find a good way to do a multiple term query as an AND.
Essentially what i want do is:
select * where field1 === 'word' AND field2 === 'different word'
I am currently using term to do an exact keyword match on field1. But adding in the second field is causing me some jip.
{
"query": {
"term": {
"field1": {
"value": "word"
}
}
}
}
This is my current query, i have tried using BOOL. I came across answers in previous versions where i could maybe use a filtered query. But i cant seem to get that to work either.
Can someone help me please. It's really doing my nut.
EDIT
Here is what i've tried with a bool / must with multiple terms. But i get no results even though i know in this case this query should return data
{
"query": {
"bool": {
"must": [
{
"term": {
"field1": {
"value": "word"
}
}
},
{
"term": {
"field2": {
"value": "other word"
}
}
}
]
}
}
}
Ok, so, fun fact. You can do the bool / match as i have tried. What you should remember is that keyword matches (which is what i'm using) are case sensitive. ES doesnt do any analyzing or filtering of the data when you set the type of a field to keyword.
It's also possible to use a bool with a filter to get the same results (when you factor in a type)
{
"query": {
"bool": {
"must":
{
"term": {
"field1": {
"value": "word"
}
}
},
"filter":
{
"term": {
"field2": {
"value": "other word"
}
}
}
}
}
}

Elastic search query using python list

How do I pass a list as query string to match_phrase query?
This works:
{"match_phrase": {"requestParameters.bucketName": {"query": "xxx"}}},
This does not:
{
"match_phrase": {
"requestParameters.bucketName": {
"query": [
"auditloggingnew2232",
"config-bucket-123",
"web-servers",
"esbck-essnap-1djjegwy9fvyl",
"tempexpo",
]
}
}
}
match_phrase simply does not support multiple values.
You can either use a should query:
GET _search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"requestParameters.bucketName": {
"value": "auditloggingnew2232"
}
}
},
{
"match_phrase": {
"requestParameters.bucketName": {
"value": "config-bucket-123"
}
}
}
]
},
...
}
}
or, as #Val pointed out, a terms query:
{
"query": {
"terms": {
"requestParameters.bucketName": [
"auditloggingnew2232",
"config-bucket-123",
"web-servers",
"esbck-essnap-1djjegwy9fvyl",
"tempexpo"
]
}
}
}
that functions like an OR on exact terms.
I'm assuming that 1) the bucket names in question are unique and 2) that you're not looking for partial matches. If that's the case, plus if there are barely any analyzers set on the field bucketName, match_phrase may not even be needed! terms will do just fine. The difference between term and match_phrase queries is nicely explained here.

Difference between elasticsearch queries

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.
The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

Elastic Search Filter performing much slower than Query

As my ES index/cluster has scaled up (# ~2 billion docs now), I have noticed more significant performance loss. So I started messing around with my queries to see if I could squeeze some perf out of them.
As I did this, I noticed that when I used a Boolean Query in my Filter, my results would take about 3.5-4 seconds to come back. But if I do the same thing in my Query it is more like 10-20ms
Here are the 2 queries:
Using a filter
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[{"match_all":{}}]}},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
Using a query
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]}}
}
Like I said, the second method where I don't use a Filter at all takes mere milliseconds, while the first query takes almost 4 seconds. This seems completely backwards from what the documentation says. They say that the Filter should actually be very quick and the Query should be the one that takes longer. So why am I seeing the exact opposite here?
Could it be something with my index mapping? If anyone has any idea why this is happening I would love to hear suggestions.
Thanks
The root filter element is actually another name for post_filter element. Somehow, it was supposed to be removed (the filter) in ES 1.1 but it slipped through and exists in 2.x versions as well.
It is removed completely in ES 5 though.
So, your first query is not a "filter" query. It's a query whose results are used afterwards (if applicable) in aggregations, and then the post_filter/filter is applied on the results. So you basically have a two steps process in there: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-request-post-filter.html
More about its performance here:
While we have gained cacheability of the tag filter, we have potentially increased the cost of scoring significantly. Post filters are useful when you need aggregations to be unfiltered, but hits to be filtered. You should not be using post_filter (or its deprecated top-level synonym filter) if you do not have facets or aggregations.
A proper filter query is the following:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [],
"must": [
{
"match_all": {}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
}
}
A filter is faster. Your problem is that you include the match_all query in your filter case. This matches on all 2 billion of your documents. A set operation has to then be done against the filter to cull the set. Omit the query portion in your filter test and you'll see that the results are much faster.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Resources