What if I use query in filter clausses in elasticsearch? - elasticsearch

What if I use query in filter clausses in elasticsearch? Will ES calculate score?
For example,
case 1:
{
"query": {
"bool": {
"filter": {
"bool":{
"should":{
}
}
}
}
}
}
case 2:
{
"query": {
"bool": {
"should": {
"bool":{
"filte":{
}
}
}
}
}
}
Will ES calculate scores in these two case?

The filter clause (query) must appear in matching documents. However, unlike
must the score of the query will be ignored. Filter clauses are
executed in filter context, meaning that scoring is ignored and
clauses are considered for caching.
Refer to this elasticsearch documentation on bool queries, to know more about this
Adding a working example with index data, search query, and search result
Index data:
{
"name": "milk",
"cost": 40
}
{
"name": "bread",
"cost": 55
}
Search Query 1:
In this, the inner bool query is wrapped in the outer filter clause, so the scoring of the should clause is ignored
{
"query": {
"bool": {
"filter": {
"bool": {
"should": {
"match": {
"name": "bread"
}
}
}
}
}
}
}
Search Result 1:
"hits": [
{
"_index": "64505740",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"name": "bread",
"cost": 55
}
}
]
Search Query 2:
In this, the inner bool query is wrapped in the filter clause, so the outer bool should clause, will not make any difference to the score
{
"query": {
"bool": {
"should": {
"bool": {
"filter": {
"term": {
"name": "bread"
}
}
}
}
}
}
}
Search Result 2:
"hits": [
{
"_index": "64505740",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"name": "bread",
"cost": 55
}
}
]
So both of your search queries will return a 0.0 score, meaning that the scoring is ignored due to the filter clause

in Elasticsearch each query under the filter section would not be involved in score calculation. It means that in both of your queries if you add your logic inside of the filter, Elasticsearch won't calculate the score. But if you add some part of your logic in the must, should or must_not section, Elasticsearch will calculate the score.

Related

Elasticsearch Bool query with minimum_should_match set to zero not honored

I add 3 documents
POST test/_doc
{"value": 1}
POST test/_doc
{"value": 2}
POST test/_doc
{"value": 3}
then do the following query I expect to return all the 3 docs with documents matching should clause being ranked higher
GET /test/_search
{
"query": {
"bool": {
"minimum_should_match": 0,
"should": [
{
"range": {
"value": {
"gte": 2
}
}
}
]
}
}
}
but instead i get only 2 docs (value 2,3) "minimum_should_match": 0, does not have any effect until i add the filter or must clause in the bool query like below,
GET /test/_search
{
"query": {
"bool": {
"filter": [ { "match_all": { } } ],
"should": [
{
"range": {
"value": {
"gte": 2
}
}
}
]
}
}
}
What I want
in the bool query, either the must clause or filter clause is empty or filled, the should clause must not filter any documents BUT only participate in ranking, please share how can i achieve that, thanks
It's a little weird that minimum_should_match: 0 is not working with the should clause. This may be due to the documentation mentioned here
No matter what number the calculation arrives at, a value greater than
the number of optional clauses, or a value less than 1 will never be
used. (ie: no matter how low or how high the result of the calculation
result is, the minimum number of required matches will never be lower
than 1 or greater than the number of clauses.
There are two ways in which you can get all the documents in the result and using the should clause only for the ranking purpose
Use must or filter clause with match_all query, which you already figured out as shown in the question above.
Another way could be to use the should clause with the boost parameter
Search Query:
{
"query": {
"bool": {
"should": [
{
"range": {
"value": {
"gte": 2,
"boost": 2.0
}
}
},
{
"range": {
"value": {
"lt": 2,
"boost": 1.0
}
}
}
]
}
}
}
Search Result will be
"hits": [
{
"_index": "68040640",
"_type": "_doc",
"_id": "2",
"_score": 2.0,
"_source": {
"value": 2
}
},
{
"_index": "68040640",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"value": 3
}
},
{
"_index": "68040640",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"value": 1
}
}
]

combine terms and bool query in elasticsearch

I would like to do a search in an elasticsearch index but only for a list of ids. I can select the ids with a terms query
{
"query": {
"terms": {
"_id": list_of_ids
}
}
}
Now I want to search in the resulting list, which can be done with a query like this
{
"query": {
"bool": {
"must": {}
}
}
}
My question is how can I combine those two queries?
One solution I found is to add the ids into the must query like this
{
"query": {
"bool": {
"must": {}
"should": [{
"term": {
"_id": id1
},
"term": {
"_id": id2
}]
}
}
}
}
which works fine. However, if the list of ids is very large it can lead to errors.
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to create query:
I am wondering whether there is a more compact way to write such a query? I think the error above is caused by my query just being too long since I added thousands of term searches... there must be a way to just provide an array, like in the terms query?
solved it
{
"query": {
"bool": {
"must": {},
"filter": {
"terms": {
"_id": list_of_ids
}
}
}
}
}
sorry I am a bit of a newbie to elasticsearch...
You can also use IDs query, which returns documents based on their IDs.
Adding a working example with index data, search query, and search result.
Index Data:
{
"name":"buiscuit",
"cost":"55",
"discount":"20"
}
{
"name":"multi grain bread",
"cost":"55",
"discount":"20"
}
Search Query:
{
"query": {
"bool": {
"must": {
"match": {
"name": "bread"
}
},
"filter": {
"ids": {
"values": [
"1",
"2",
"4"
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "65431114",
"_type": "_doc",
"_id": "1",
"_score": 0.5754429,
"_source": {
"name": "multi grain bread",
"cost": "55",
"discount": "20"
}
}
]

Query and exclude in ElasticSearch

I'm trying to use the match_phrase_prefix query with an exclude query, so that it matches all terms except for the terms to be exclude. I have it figured out in a basic URI query, but not the regular JSON query. How do I convert this URI into a JSON type query?
"http://127.0.0.1:9200/topics/_search?q=name:"
+ QUERY + "* AND !name=" + CURRENT_TAGS
Where CURRENT_TAGS is a list of tags not to match with.
This is what I have so far:
{
"query": {
"bool": {
"must": {
"match_phrase_prefix": {
"name": "a"
}
},
"filter": {
"terms": {
"name": [
"apple"
]
}
}
}
}
}
However, when I do this apple is still included in the results. How do I exclude apple?
You are almost there, you can use must_not, which is part of boolean query to exclude the documents which you don't want, below is working example on your sample.
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index sample docs as apple and amazon worlds biggest companies which matches your search criteria :)
Search query to exclude apple
{
"query": {
"bool": {
"must": {
"match_phrase_prefix": {
"name": "a"
}
},
"must_not": {
"match": {
"name": "apple"
}
}
}
}
}
Search results
"hits": [
{
"_index": "matchprase",
"_type": "_doc",
"_id": "2",
"_score": 0.6931471,
"_source": {
"name": "amazon"
}
}
]

why does elasticsearch calculates score for term queries?

I want to make a simple query based on knowing a unique field value using a term query. For instance:
{
"query": {
"term": {
"products.product_id": {
"value": "Ubsdf-234kjasdf"
}
}
}
}
Regarding term queries, Elasticsearch documentation states:
Returns documents that contain an exact term in a provided field.
You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.
On the other hand, documentation also suggests that the _score is calculated for queries where relevancy matters (and is not the case for filter context which involves exact match).
I find it a bit confusing. Why does Elasticsearch calculates _score for term queries which are supposed to be concerned with exact match and not relevancy?
term queries are not analyzed, hence they would not go with the analysis phase, hence used for an exact match, but their score is still calculated when used in query context.
When you use term queries in filter context, then it means you are not searching on them, and rather doing filtering on them, hence there is no score calculated for them.
More info on query and filter context in official ES doc.
Both the example of term query in filter and query context shown in my below example
Term query in query context
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "c"
}
}
]
}
},
"size": 10
}
And result with a score
"hits": [
{
"_index": "cpp",
"_type": "_doc",
"_id": "4",
"_score": 0.2876821, --> notice score is calculated
"_source": {
"title": "c"
}
}
]
Term query in filter context
{
"query": {
"bool": {
"filter": [ --> prev cluase replaced by `filter`
{
"term": {
"title": "c"
}
}
]
}
},
"size": 10
}
And search result with filter context
"hits": [
{
"_index": "cpp",
"_type": "_doc",
"_id": "4",
"_score": 0.0, --> notice score is 0.
"_source": {
"title": "c"
}
}
]
Filter context means that you need to wrap your term query inside a bool/filter query, like this:
{
"query": {
"bool": {
"filter": {
"term": {
"products.product_id": {
"value": "Ubsdf-234kjasdf"
}
}
}
}
}
}
The above query will not compute scores.

Sort based on the service time of stores

My project contains some stores with their working time and I index them in ElasticSearch. Now there are some scenarios in my product:
Whenever the client requests for the stores which are available now, I use the following range filter:
bool: {
must: [
{ range: {startTime: { lte: now}} },
{ range: {endTime: { gte: now}} }
]
}
Let's call the result Online stores.
When the client requests for all stores, I have to give them all the documents, but I have to sort them, first online stores and then other stores.
I can do that by two queries, one for online and another one for offline store but I want to do that once. Any idea?
You can achieve this by using should as an "optional" clause:
If the bool query is in a query context and has a must or filter
clause then a document will match the bool query even if none of the
should queries match. In this case these clauses are only used to
influence the score.
The bool query takes a more-matches-is-better approach, so the score
from each matching must or should clause will be added together to
provide the final _score for each document.
The query might look like this:
POST my-should/doc/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"should": {
"bool": {
"must": [
{
"range": {
"startTime": {
"lte": "2018-06-24T16:39:59"
}
}
},
{
"range": {
"endTime": {
"gte": "2018-06-22T16:39:59"
}
}
}
],
"_name": "Online"
}
}
}
}
}
The match part of this bool query will define which documents will match, and the should part will boost those that also match additional criteria.
Note that here we used Named Queries to highlight that the "Online" part of the query was matched to a document. The response could look like this:
"hits": [
{
"_index": "my-should",
"_type": "doc",
"_id": "BKgZLWQBERN2JBe1CQ5t",
"_score": 3,
"_source": {
"startTime": "2018-06-23T16:39:59",
"endTime": "2018-06-23T16:39:59"
},
"matched_queries": [
"Online"
]
},
{
"_index": "my-should",
"_type": "doc",
"_id": "BagaLWQBERN2JBe12A7y",
"_score": 1,
"_source": {
"startTime": "2018-06-20T16:39:59",
"endTime": "2018-06-21T16:39:59"
}
}
]
Hope that helps!

Resources