Elasticsearch: positive boost when term is not present - elasticsearch

I'm trying to implement a simple search for products using Elasticsearch.
One of the problems that I'm having is that often search queries have implied terms. For example, consider that when someone types in "lenovo thinkpad battery" they want a battery. However, when someone types in just "lenovo thinkpad" they want a laptop, even though that term doesn't appear in the query.
My solution for this is the following. Manually put together a bunch of related terms. For example, for the computer/laptop category I could have the terms "battery", "keyboard", "power cord", "adapter", "cable", "protection plan" etc. Then, whenever no such term is present in the search query, I positive boost all the results that don't contain those terms.
Is this possible with Elasticsearch?
EDIT:
Example documents
{"_source": { "item_title": "lenovo thinkpad white/black" },
"_source": { "item_title": "lenovo thinkpad battery" }
}
Mapping
{
"properties": {
"item_title": {
"type": "string"
}
}
}
Query
POST my_index/my_type/_search
{
"from": 0,
"size": 10,
"query": {
"match": {
"item_title": "lenovo thinkpad"
}
}
}
Query result:
"hits": {
"total": 2,
"max_score": 0.2169777,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.2169777,
"_source": {
"item_title": "lenovo thinkpad battery"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.2169777,
"_source": {
"item_title": "lenovo thinkpad black/white"
}
}
]
}
Notice that the score for these two results is the same. However, since the query "lenovo thinkpad" doesn't contain one of those special terms that I manually picked out, like "battery", I would like documents that don't contain that term to be positive boosted, so that the document with "item_title": "lenovo thinkpad white/black" should have higher score in the query results.

If I execute the Following Query in my Wikipedia index
GET /_search
{
"query": {
"query_string": {
"query": "(Darmstadt)^10 (NOT School)^8",
"fields": [
"title^3"
],
"phrase_slop": 3,
"use_dis_max": true
}
}
}
I Still get Darmstadt School in the results further down the list (it comes in the first 10 normally)
If i execute the Following Query
GET /_search
{
"query": {
"query_string": {
"query": "(Darmstadt AND SCHOOL )^10 (NOT School)^8",
"fields": [
"title^3"
],
"phrase_slop": 3,
"use_dis_max": true
}
}
}
I Get Darmstadt School as the First result despite it being in the NOT clause.
So I suggest you do something similar.

Related

Scoring higher for shorter fields

I'm trying to get a higher score (or at least the same score) for the shortest values on Elastic Search.
Let's say I have these documents: "Abc", "Abca", "Abcb", "Abcc". The field label.ngram uses an EdgeNgram analyser.
With a really simple query like that:
{
"query": {
"match": {
"label.ngram": {
"query": "Ab"
}
}
}
}
I always get first the documents "Abca", "Abcb", "Abcc" instead of "Abc".
How can I get "Abc" first?
(should I use this: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html?)
Thanks!
This is happening due to field normalization and to get the same score, you have to disable the norms on the field.
Norms store various normalization factors that are later used at query
time in order to compute the score of a document relatively to a
query.
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"norms": false,
"analyzer": "my_analyzer"
}
}
}
}
Index Data:
{
"title": "Abca"
}
{
"title": "Abcb"
}
{
"title": "Abcc"
}
{
"title": "Abc"
}
Search Query:
{
"query": {
"match": {
"title": {
"query": "Ab"
}
}
}
}
Search Result:
"hits": [
{
"_index": "65953349",
"_type": "_doc",
"_id": "1",
"_score": 0.1424427,
"_source": {
"title": "Abca"
}
},
{
"_index": "65953349",
"_type": "_doc",
"_id": "2",
"_score": 0.1424427,
"_source": {
"title": "Abcb"
}
},
{
"_index": "65953349",
"_type": "_doc",
"_id": "3",
"_score": 0.1424427,
"_source": {
"title": "Abcc"
}
},
{
"_index": "65953349",
"_type": "_doc",
"_id": "4",
"_score": 0.1424427,
"_source": {
"title": "Abc"
}
}
]
As mentioned by #ESCoder that using norms you can fix the scoring but this would not be very useful, if you want to score your search results, as this would cause all the documents in your search results to have the same score, which will impact the relevance of your search results big time.
Maybe you should tweak the document length norm param for default similarity algorithm(BM25) if you are on ES 5.X or higher. I tried doing this with your dataset and my setting but didn't make it to work.
Second option which will mostly work as suggested by you is to store the size of your fields in different field(but) this you should populate from your application as after analysis process, various tokens would be generated for same field. but this is extra overhead and I would prefer doing this by tweaking the similarity algo param.

How to add fuzziness to search as you type field in Elasticsearch?

I've been trying to add some fuzziness to my search as you type field type on Elasticsearch, but never got the needed query. Anyone have any idea to implement this?
Fuzzy Query returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
The fuzziness parameter can be specified as:
AUTO -- It generates an edit distance based on the length of the term.
For lengths:
0..2 -- must match exactly
3..5 -- one edit allowed Greater than 5 -- two edits allowed
Adding working example with index data and search query.
Index Data:
{
"title":"product"
}
{
"title":"prodct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prodc",
"fuzziness":2,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0794415,
"_source": {
"title": "produt"
}
}
]
Refer these blogs to get a detailed explaination on fuzzy query
https://www.elastic.co/blog/found-fuzzy-search
https://qbox.io/blog/elasticsearch-optimization-fuzziness-performance
Update 1:
Refer this ES official documentation
The fuzziness , prefix_length , max_expansions , rewrite , and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
There are some open issues and discuss links that states that - Fuzziness not work with bool_prefix multi_match (search-as-you-type)
https://github.com/elastic/elasticsearch/issues/56229
https://discuss.elastic.co/t/fuzziness-not-work-with-bool-prefix-multi-match-search-as-you-type/229602/3
I know this question is asked long ago but I think this worked for me.
Since Elasticsearch allows a single field to be declared with multiple data types, my mapping is like below.
PUT products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"product_type": {
"type": "search_as_you_type"
}
}
}
}
}
}
After adding some data to the index I fetched like this.
GET products/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "prodc",
"type": "bool_prefix",
"fields": [
"title.product_type",
"title.product_type._2gram",
"title.product_type._3gram"
]
}
},
{
"multi_match": {
"query": "prodc",
"fuzziness": 2
}
}
]
}
}
}

Elasticsearch - pass fuzziness parameter in query_string

I have a fuzzy query with customized AUTO:10,20 fuzziness value.
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
How to convert it to a query_string query? I tried nike~AUTO:10,20 but it is not working.
It's possible with query_strng as well, let me show using the same example as OP provided, both match_query provided by OP matches and query_string fetches the same document with same score.
And according to this and this ES docs, Elasticsearch supports AUTO:10,20 format, which is shown in my example as well.
Also
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index some doc
{
"name" : "nike"
}
Search query using match with fuzziness
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Query_string with fuzziness
{
"query": {
"query_string": {
"fields": ["name"],
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Lucene syntax only allows you to specify "fuzziness" with the tilde symbol "~", optionally followed by 0, 1 or 2 to indicate the edit distance.
Elasticsearch Query DSL supports a configurable special value for AUTO which then is used to build the proper Lucene query.
You would need to implement that logic on your application side, by evaluating the desired edit distance based on the length of your search term and then use <searchTerm>~<editDistance> in your query_string-query.

Elasticsearch query prefer exact match over partial match on multiple fields

I am doing a free text search on documents with multiple fields. When I perform a search I want the documents that have a perfect match on any of the labels to have a higher scoring. Is there any way I can do this from the query?
For example the documents have two fields called label-a and label-b and when I perform the following multi-match query:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "apple",
"type": "most_fields",
"fields": [
"label-a",
"label-b"
]
}
}
]
}
}
}
I get the following results (only the relevant part):
"hits": [
{
"_index": "salad",
"_type": "fruit",
"_id": "4",
"_score": 0.581694,
"_source": {
"label-a": "apple pie and pizza",
"label-b": "pineapple with apple juice"
}
},
{
"_index": "salad",
"_type": "fruit",
"_id": "2",
"_score": 0.1519148,
"_source": {
"label-a": "grape",
"label-b": "apple"
}
},
{
"_index": "salad",
"_type": "fruit",
"_id": "1",
"_score": 0.038978107,
"_source": {
"label-a": "apple apple apple apple apple apple apple apple apple apple apple apple",
"label-b": "raspberry"
}
},
{
"_index": "salad",
"_type": "fruit",
"_id": "3",
"_score": 0.02250402,
"_source": {
"label-a": "apple pie and pizza",
"label-b": "raspberry"
}
}
]
I want the second document, the one with the value grape for label-a and value apple for label-b, to have the highest score as I am searching for the value apple and one of the labels has that exact value. This should work regardless of which label the exact term appears.
Because Elasticsearch uses tf/idf model for scoring you are getting these results. Try to specify in your index fields "label-a" and "label-b" additionally as not-analyzed(raw) fields. Then rewrite your query someth like this:
{
"query": {
"bool": {
"should": {
"match": {
"label-a.raw": {
"query": "apple",
"boost": 2
}
}
},
"must": [
{
"multi_match": {
"query": "apple",
"type": "most_fields",
"fields": [
"label-a",
"label-b"
]
}
}
]
}
}
}
The should clause will boost documents with exact match and you will probably get them in the first place. Try to play with the boost number and pls check th equery before running. This is just and idea what you can do

Elasticsearch GET the last document for a given field if it exists

I have a short question which seems to be simple, but I wasn't able to find any answer so far.
I want to retrieve on an Elasticsearch node, the last document given to a date field. But I want to have the last document, only for documents which contains a specific field.
For instance, let's say I want to get the last purchase which contains the field "promotionCode" :
Query :
http://elasticsearch:9200/store1/purchase/_search?q=vendor:Marie&size=1&sort=date:desc
where store1 is my index, purchase a document type.
Now let's say I have these two documents in my ElasticSearch :
"hits": [
{
"_index": "store1",
"_type": "purchase",
"_id": "1",
"_score": 1,
"_source": {
"date": "2016-03-16T12:53:16.000Z",
"vendor": "Marie",
"promotionCode": "XYZ123"
}
},
{
"_index": "store1",
"_type": "purchase",
"_id": "2",
"_score": 1,
"_source": {
"date": "2016-03-18T12:53:16.000Z",
"vendor": "Marie"
}
}
]
The above query will retrieve the document of id 2, but I will not have any field "promotionCode" in my result.
If I want to get the last document, containing a specific field, how do I do ?
I explored "fields" filter, but it only send back void document if the field is not contained, and I read about Source filtering but not sure it is doing what I want ...
Thanks a lot for any hint !
Yo can try with this query:
{
"query": {
"term": { "vendor": "Marie" }
},
"filter": {
"bool": {
"must_not": { "missing": { "field": "promotionCode" } }
}
},
"sort": { "date" : "desc" },
"size": 1
}
You can use Exists Query
GET /store1/purchase/_search?q=vendor:Marie&size=1&sort=date:desc
{
"query": {
"exists" : {
"field" : "promotionCode"
}
}
}
Hope it helps!!

Resources