why does elasticsearch calculates score for term queries? - elasticsearch

I want to make a simple query based on knowing a unique field value using a term query. For instance:
{
"query": {
"term": {
"products.product_id": {
"value": "Ubsdf-234kjasdf"
}
}
}
}
Regarding term queries, Elasticsearch documentation states:
Returns documents that contain an exact term in a provided field.
You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.
On the other hand, documentation also suggests that the _score is calculated for queries where relevancy matters (and is not the case for filter context which involves exact match).
I find it a bit confusing. Why does Elasticsearch calculates _score for term queries which are supposed to be concerned with exact match and not relevancy?

term queries are not analyzed, hence they would not go with the analysis phase, hence used for an exact match, but their score is still calculated when used in query context.
When you use term queries in filter context, then it means you are not searching on them, and rather doing filtering on them, hence there is no score calculated for them.
More info on query and filter context in official ES doc.
Both the example of term query in filter and query context shown in my below example
Term query in query context
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "c"
}
}
]
}
},
"size": 10
}
And result with a score
"hits": [
{
"_index": "cpp",
"_type": "_doc",
"_id": "4",
"_score": 0.2876821, --> notice score is calculated
"_source": {
"title": "c"
}
}
]
Term query in filter context
{
"query": {
"bool": {
"filter": [ --> prev cluase replaced by `filter`
{
"term": {
"title": "c"
}
}
]
}
},
"size": 10
}
And search result with filter context
"hits": [
{
"_index": "cpp",
"_type": "_doc",
"_id": "4",
"_score": 0.0, --> notice score is 0.
"_source": {
"title": "c"
}
}
]

Filter context means that you need to wrap your term query inside a bool/filter query, like this:
{
"query": {
"bool": {
"filter": {
"term": {
"products.product_id": {
"value": "Ubsdf-234kjasdf"
}
}
}
}
}
}
The above query will not compute scores.

Related

What if I use query in filter clausses in elasticsearch?

What if I use query in filter clausses in elasticsearch? Will ES calculate score?
For example,
case 1:
{
"query": {
"bool": {
"filter": {
"bool":{
"should":{
}
}
}
}
}
}
case 2:
{
"query": {
"bool": {
"should": {
"bool":{
"filte":{
}
}
}
}
}
}
Will ES calculate scores in these two case?
The filter clause (query) must appear in matching documents. However, unlike
must the score of the query will be ignored. Filter clauses are
executed in filter context, meaning that scoring is ignored and
clauses are considered for caching.
Refer to this elasticsearch documentation on bool queries, to know more about this
Adding a working example with index data, search query, and search result
Index data:
{
"name": "milk",
"cost": 40
}
{
"name": "bread",
"cost": 55
}
Search Query 1:
In this, the inner bool query is wrapped in the outer filter clause, so the scoring of the should clause is ignored
{
"query": {
"bool": {
"filter": {
"bool": {
"should": {
"match": {
"name": "bread"
}
}
}
}
}
}
}
Search Result 1:
"hits": [
{
"_index": "64505740",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"name": "bread",
"cost": 55
}
}
]
Search Query 2:
In this, the inner bool query is wrapped in the filter clause, so the outer bool should clause, will not make any difference to the score
{
"query": {
"bool": {
"should": {
"bool": {
"filter": {
"term": {
"name": "bread"
}
}
}
}
}
}
}
Search Result 2:
"hits": [
{
"_index": "64505740",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"name": "bread",
"cost": 55
}
}
]
So both of your search queries will return a 0.0 score, meaning that the scoring is ignored due to the filter clause
in Elasticsearch each query under the filter section would not be involved in score calculation. It means that in both of your queries if you add your logic inside of the filter, Elasticsearch won't calculate the score. But if you add some part of your logic in the must, should or must_not section, Elasticsearch will calculate the score.

How to add fuzziness to search as you type field in Elasticsearch?

I've been trying to add some fuzziness to my search as you type field type on Elasticsearch, but never got the needed query. Anyone have any idea to implement this?
Fuzzy Query returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
The fuzziness parameter can be specified as:
AUTO -- It generates an edit distance based on the length of the term.
For lengths:
0..2 -- must match exactly
3..5 -- one edit allowed Greater than 5 -- two edits allowed
Adding working example with index data and search query.
Index Data:
{
"title":"product"
}
{
"title":"prodct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prodc",
"fuzziness":2,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0794415,
"_source": {
"title": "produt"
}
}
]
Refer these blogs to get a detailed explaination on fuzzy query
https://www.elastic.co/blog/found-fuzzy-search
https://qbox.io/blog/elasticsearch-optimization-fuzziness-performance
Update 1:
Refer this ES official documentation
The fuzziness , prefix_length , max_expansions , rewrite , and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
There are some open issues and discuss links that states that - Fuzziness not work with bool_prefix multi_match (search-as-you-type)
https://github.com/elastic/elasticsearch/issues/56229
https://discuss.elastic.co/t/fuzziness-not-work-with-bool-prefix-multi-match-search-as-you-type/229602/3
I know this question is asked long ago but I think this worked for me.
Since Elasticsearch allows a single field to be declared with multiple data types, my mapping is like below.
PUT products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"product_type": {
"type": "search_as_you_type"
}
}
}
}
}
}
After adding some data to the index I fetched like this.
GET products/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "prodc",
"type": "bool_prefix",
"fields": [
"title.product_type",
"title.product_type._2gram",
"title.product_type._3gram"
]
}
},
{
"multi_match": {
"query": "prodc",
"fuzziness": 2
}
}
]
}
}
}

what happens when performing a match query on a date field

Why can I perform a query of the following type:
GET myindex/_search
{
"query": {
"bool" : {
"must": [
{"match": {"#timestamp": "454545645656"}}
]
}
}
}
when the field type is the following one?
"mappings": {
"fluentd": {
"properties": {
"#timestamp": {
"type": "date"
},
does it make sense? Does the query value passes the analyzer and compares the field against what?
No, even you are using the match query on date field and match are analyzed means it goes through the same analyzers applied at index time on the field. As explained in official ES doc.
But as explained in the official ES doc on date datatype.
Queries on dates are internally converted to range queries on this
long representation
You can test it yourself by using the explain=true param on your search query. More info about explain API can be found here.
I did this for your search query and you can see in the result(explanation part) it shows the range query on the date field.
URL:- /_search?explain=true
"hits": [
{
"_shard": "[date-index][0]",
"_node": "h2H2MJd5T5-b1cUSkHVHcw",
"_index": "date-index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"#timestamp": "454545645656"
},
"_explanation": {
"value": 1.0,
"description": "#timestamp:[454545645656 TO 454545645656]", --> see range query
"details": []
}
}

Sort based on the service time of stores

My project contains some stores with their working time and I index them in ElasticSearch. Now there are some scenarios in my product:
Whenever the client requests for the stores which are available now, I use the following range filter:
bool: {
must: [
{ range: {startTime: { lte: now}} },
{ range: {endTime: { gte: now}} }
]
}
Let's call the result Online stores.
When the client requests for all stores, I have to give them all the documents, but I have to sort them, first online stores and then other stores.
I can do that by two queries, one for online and another one for offline store but I want to do that once. Any idea?
You can achieve this by using should as an "optional" clause:
If the bool query is in a query context and has a must or filter
clause then a document will match the bool query even if none of the
should queries match. In this case these clauses are only used to
influence the score.
The bool query takes a more-matches-is-better approach, so the score
from each matching must or should clause will be added together to
provide the final _score for each document.
The query might look like this:
POST my-should/doc/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"should": {
"bool": {
"must": [
{
"range": {
"startTime": {
"lte": "2018-06-24T16:39:59"
}
}
},
{
"range": {
"endTime": {
"gte": "2018-06-22T16:39:59"
}
}
}
],
"_name": "Online"
}
}
}
}
}
The match part of this bool query will define which documents will match, and the should part will boost those that also match additional criteria.
Note that here we used Named Queries to highlight that the "Online" part of the query was matched to a document. The response could look like this:
"hits": [
{
"_index": "my-should",
"_type": "doc",
"_id": "BKgZLWQBERN2JBe1CQ5t",
"_score": 3,
"_source": {
"startTime": "2018-06-23T16:39:59",
"endTime": "2018-06-23T16:39:59"
},
"matched_queries": [
"Online"
]
},
{
"_index": "my-should",
"_type": "doc",
"_id": "BagaLWQBERN2JBe12A7y",
"_score": 1,
"_source": {
"startTime": "2018-06-20T16:39:59",
"endTime": "2018-06-21T16:39:59"
}
}
]
Hope that helps!

Elasticsearch: positive boost when term is not present

I'm trying to implement a simple search for products using Elasticsearch.
One of the problems that I'm having is that often search queries have implied terms. For example, consider that when someone types in "lenovo thinkpad battery" they want a battery. However, when someone types in just "lenovo thinkpad" they want a laptop, even though that term doesn't appear in the query.
My solution for this is the following. Manually put together a bunch of related terms. For example, for the computer/laptop category I could have the terms "battery", "keyboard", "power cord", "adapter", "cable", "protection plan" etc. Then, whenever no such term is present in the search query, I positive boost all the results that don't contain those terms.
Is this possible with Elasticsearch?
EDIT:
Example documents
{"_source": { "item_title": "lenovo thinkpad white/black" },
"_source": { "item_title": "lenovo thinkpad battery" }
}
Mapping
{
"properties": {
"item_title": {
"type": "string"
}
}
}
Query
POST my_index/my_type/_search
{
"from": 0,
"size": 10,
"query": {
"match": {
"item_title": "lenovo thinkpad"
}
}
}
Query result:
"hits": {
"total": 2,
"max_score": 0.2169777,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.2169777,
"_source": {
"item_title": "lenovo thinkpad battery"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.2169777,
"_source": {
"item_title": "lenovo thinkpad black/white"
}
}
]
}
Notice that the score for these two results is the same. However, since the query "lenovo thinkpad" doesn't contain one of those special terms that I manually picked out, like "battery", I would like documents that don't contain that term to be positive boosted, so that the document with "item_title": "lenovo thinkpad white/black" should have higher score in the query results.
If I execute the Following Query in my Wikipedia index
GET /_search
{
"query": {
"query_string": {
"query": "(Darmstadt)^10 (NOT School)^8",
"fields": [
"title^3"
],
"phrase_slop": 3,
"use_dis_max": true
}
}
}
I Still get Darmstadt School in the results further down the list (it comes in the first 10 normally)
If i execute the Following Query
GET /_search
{
"query": {
"query_string": {
"query": "(Darmstadt AND SCHOOL )^10 (NOT School)^8",
"fields": [
"title^3"
],
"phrase_slop": 3,
"use_dis_max": true
}
}
}
I Get Darmstadt School as the First result despite it being in the NOT clause.
So I suggest you do something similar.

Resources