Give more score to documents that contains all query terms - elasticsearch

I have a problem with scoring in elasticsearch. When user enter a query that contains 3 terms, sometimes a document that has two words a lot, outscores a document that contains all three words. for example if user enters "elasticsearch query tutorial", I want documents that contains all these words score higher than a document with a lot of "tutorial" and "elasticsearch" terms in it.
PS: I am using minimum should match and shingls in my query. also they made ranking a lot better, they did not solve this problem completely. I need something like query coordination in lucene's practical scoring function. is there anything like that in elastic with BM-25?

One of the possible solutions could be using function score:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "match": { "title": "elasticserch" } },
"weight": 1
},
{
"filter": { "match": { "title": "tutorial" } },
"weight": 1
}
],
"score_mode": "sum"
}
}
}
In this case, you would have clearly a better position for documents with more matches. However, this would completely ignore TF-IDF or any other parameters.

Related

Match query fuzzily to an array of candidates

I have an index in elastic with the following document structure:
{
"questions": [
"What is your name?",
"How are you called?",
"What should I call you?",
...
],
"answer": "<answer>"
}
I would like to match queries to one of the entries in the questions array.
For example the query "What's your name"?
The returning document should be the one with the closest matching entry of questions in all the documents in the index.
I have tried:
{
"query": {
"match": { "questions": { "query": "<question>", "fuzziness": "auto" } },
}
}
But that sometimes returns a "wrong" document, even if the query is one of the entries of questions in one of the documents exactly.
I've also tried
{
"query": {
"match_phrase": { "questions": "<query>" },
}
}
But that doesn't allow fuzziness, and since the queries are human inputs, it's not catching enough cases
And lastly I tried
{
"query": {
"span_near": [
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<first word of the query>" },
}
}
},
{ "span_multi": {
"match": {
"fuzzy: {
"questions": { "fuzziness": "auto", "value": "<second word of the query>" },
}
}
},
...
]
}
}
But that (at least as far as I seem to notice) only matches questions exactly with fuzzy words.
What I would like (at least as far as I understand), is a fuzzy TF-IDF across all entries of questions, get the best match and then rank the documents according to the best matches of one of the entries of questions (not the entirety of the questions array)
I'm a pretty inexperienced novice when it comes to Elastic, so I appreciate any tips and tricks or outright solutions you might have for me, thank you!

Elasticsearch - Impact of adding Boost to query

I have a very simple Elastic query mentioned below.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"tag": {
"query": "Audience: PRO Brand: Samsung",
"boost": 3,
"operator": "and"
}
}
},
{
"match": {
"tag": {
"query": "audience: PRO brand samsung",
"boost": 2,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
I want to know if I add a boost in the query, will there be any performance impact because of this, and also will boosting help if you have a very large data set, where the occurrence of a search word is common.
Elasticsearch adds boost param with default value, IMO giving different value won't make much difference in the performance, but you should be able to measure it yourself.
Reg. your second question, adding boost definitely makes sense where the occurrence of your search words are common, this will help you to find the relevant document. for example: suppose you are searching for query in a index containing Elasticsearch posts(query will be very common on Elasticsearch posts), but you want the give more weight to documents which have tag elasticsearch-query. Adding boosts in this case, will provide you more relevant results.

ElasticSearch Ignoring words having one single letter

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.
If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

elasticsearch function score, boost weight of "number of matched terms in query" (coordination)

I want to use elasticsearch function score for customized scoring and these are my priorities for ranking:
number of common terms with query (for example a document which has 3 of 4 terms in query should be ranked higher than a document which has 2 of 4 terms in query, no matter how much is tf/idf score of each term). in elastic documentation it's called coordination factor.
sum of relevancy of terms. (tf/idf)
document popularity (number of votes for each document as described in boosting by popularity)
This is the body of request for elasticsearch currently used:
body = {
"query": {
"function_score": {
"query": {
{'match': {'text': query}}
},
"functions": [
{
"field_value_factor": {
"field": "ducoumnet_popularity",
}
}
],
}
}
}
Problem is that first priority is not satisfied with this request. for example there could be document A which has less common terms with query than document B, but because its common terms have more tf/idf score, document A is ranked higher than document B.
To prevent this I think the best way is to boost score of documents by coordination factor. is there any way to do this? something similar to this request:
body = {
"query": {
"function_score": {
"query": {
{'match': {'text': query}}
},
"functions": [
{
"field_value_factor": {
"field": "ducoumnet_popularity",
},
"field_value_factor": {
"field": "_coordination"
"weight": 10
}
}
],
}
}
}
I didn't find exact answer for this question but it may help someone to know that you can limit minimum precision for documents in result using minimum_should_match.
{
"query": {
"match": {
"content": {
"query": "quick brown dog",
"minimum_should_match": 75%
}
}
}
}
it accept many different configuration. more explanation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html

elasticsearch boost importance of exact phrase match

Is there a way in elasticsearch to boost the importance of the exact phrase appearing in the the document?
For example if I was searching for the phrase "web developer" and if the words "web developer" appeared together they would be boosted by 5 compared to "web" and "developer" appearing separately throughout the document. Thereby any document that contained "web developer" together would appear first in the results.
You can combine different queries together using a bool query, and you can assing a different boost to them as well. Let's say you have a regular match query for both the terms, regardless of their positions, and then a phrase query with a higher boost.
Something like the following:
{
"query": {
"bool": {
"should": [
{
"match": {
"field": "web developer"
}
},
{
"match_phrase": {
"field": "web developer",
"boost": 5
}
}
],
"minimum_number_should_match": 1
}
}
}
As an alternative to javanna's answer, you could do something similar with must and should clauses within a bool query:
{
"query": {
"bool": {
"must": {
"match": {
"field": "web developer",
"operator": "and"
}
},
"should": {
"match_phrase": {
"field": "web developer"
}
}
}
}
}
Untested, but I believe the must clause here will match results containing both 'web' and 'developer' and the should clause will score phrases matching 'web developer' higher.
You could try using rescore to run an exact phrase match on your initial results. From the docs:
"Rescoring can help to improve precision by reordering just the top (eg 100 - 500) documents returned by the query and post_filter phases, using a secondary (usually more costly) algorithm, instead of applying the costly algorithm to all documents in the index."
https://www.elastic.co/guide/en/elasticsearch/reference/current/filter-search-results.html#rescore
I used below sample query in my case which is working. It brings exact + fuzzy results but exact ones are boosted!
{ "query": {
"bool": {
"should": [
{
"match": {
"name": "pala"
}
},
{
"fuzzy": {
"name": "pala"
}
}
]
}}}
I do not have enough reputation to comment on James Adison's answer, which I agree with.
What is still missing is the boost factor, which can be done using the following syntax:
{
"match_phrase":
{
"fieldName": {
"query": "query string for exact match",
"boost": 10
}
}
}
I think its default behaviour already with match query "or" operator. It'll filter phrase "web developer" first and then terms like "web" or "develeper". Though you can boost your query using above answers. Correct me if I'm wrong.

Resources