Elasticsearch how to replace "terms" query? - elasticsearch

Using Elasticsearch before version 6 the following query returned hits based on similarity to the query. Now in ES 6+ this returns hit based on whether they match any part of the query, all with the same score. This change breaks the rest of the more complicated query (not shown).
How can I get hit scores that match the older ES 5- for the following query
{
"query": {
"bool": {
"should": [
{
"terms": {
"some_field_name": [
"iPad Pro",
"iPhone 8"
]
}
}
]
}
}
}
In ES 5 this returns hits scored higher for matching all terms and lower for matching less. In ES 6+ this returns hits with only scores = 1 and so they are ranked with no regards to how many matched terms there were.
For instance a hit that has both terms is ranked higher in ES 5:
"_score": 0.87546873, when 2 of 4 terms match
"_score": 0.60353506, when 2 of 5 terms match
"_score": 0.13353139, when 1 of 4 terms match
This is much like the desired "cosine" similarity that we need (understood that this is not precisely how scores are created).
What query will return the same scores as ES 5 does for above query. In other words, what is the ES 6 equivalent query?

Note: Just posting it as a draft suggestion since discussion in comments is getting longer and clumsy.
UPDATE: Just checked both the methods I suggested and both seem to give similar scoring. Please try these methods and see if the scores you get are of any relevance to you.
I get that it doesn't work for terms but I suggested to replace it with either multiple "term" filter which also has a boost parameter or else replace it with "term_set" query and set required_matches param to "1" since we want the "or" behaviour.
1) Using terms_set query:
{
"query": {
"bool": {
"should": [
{
"terms_set": {
"some_field_name": {
"terms" : ["iPad Pro", "iPhone 8"],
"minimum_should_match_script": {
"source": "1"
}
}
}
}
]
}
}
}
2) Using multiple term filters instead of single terms filter:
{
"query": {
"bool": {
"should": [
{
"term": {
"some_field_name": "iPad Pro"
}
},
{
"term": {
"some_field_name": "iPhone 8"
}
}
]
}
}
}
If you were using boost in terms, then you should apply same boost in each of term blocks.

Related

Elasticsearch - Impact of adding Boost to query

I have a very simple Elastic query mentioned below.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"tag": {
"query": "Audience: PRO Brand: Samsung",
"boost": 3,
"operator": "and"
}
}
},
{
"match": {
"tag": {
"query": "audience: PRO brand samsung",
"boost": 2,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
I want to know if I add a boost in the query, will there be any performance impact because of this, and also will boosting help if you have a very large data set, where the occurrence of a search word is common.
Elasticsearch adds boost param with default value, IMO giving different value won't make much difference in the performance, but you should be able to measure it yourself.
Reg. your second question, adding boost definitely makes sense where the occurrence of your search words are common, this will help you to find the relevant document. for example: suppose you are searching for query in a index containing Elasticsearch posts(query will be very common on Elasticsearch posts), but you want the give more weight to documents which have tag elasticsearch-query. Adding boosts in this case, will provide you more relevant results.

What is the difference between should and boost final score calculation?

I'm a little confused about what is the difference between should and boost final score calculation
when a bool query has a must clause, the should clauses act as a boost factor, meaning none of them have to match but if they do, the relevancy score for that document will be boosted and thus appear higher in the result.
so,if we have:
one query which contains must and should clauses
vs
second query which contains must clause and boosting clause
Is there a difference ?
when you recommend to use must and should vs must and boosting clauses in a query ?
You can read the documentation of boolean query here, there is huge difference in the should and boost.
Should and must both contributes to the _score of the document, and as mentioned in the above documentation, follows the
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
While boost is a parameter, using which you can increase the weight according to your value, let me explain that using an example.
Index sample docs
POST _doc/1
{
"brand" : "samsung",
"name" : "samsung phone"
}
POST _doc/2
{
"brand" : "apple",
"name" : "apple phone"
}
Boolean Query using should without boost
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple"
}
}
}
]
}
}
}
Search result showing score
"max_score": 1.3862942,
Now in same query use boost of factor 10
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "apple"
}
}
},
{
"match": {
"brand": {
"query": "apple",
"boost": 10 --> Note additional boost
}
}
}
]
}
}
}
Query result showing boost
"max_score": 7.624619, (Note considerable high score)
In short, when you want to boost a particular document containing your query term, you can additionally pass the boost param and it will be on top of the normal score calculated by should or must.

ElasticSearch Ignoring words having one single letter

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.
If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

How to boost individual documents

I have a pretty complex query and now I want to boost some documents that fulfill some criteria. I have the following simplified document structure and I try to give some documents a boost based on the id, genre, tag.
{
"id": 123,
"genres": ["ACTION", "DRAMA"],
"tags": ["For kids", "Romantic", "Nature"]
}
What I want to do is for example
id: 123 boost: 5
genres: ACTION boost: 3
tags: Romantic boost: 0.2
and boost all documents that are contained in my query and fit the criteria but I don't want to filter them out. So query clause boosting is not of any help I guess.
Edit: To make if easier to understand what I want to achieve (not sure if it is possible with elasticsearch, no is also a valid answer).
I want to search with a query and get a result set. In this set I want to boost some documents. But I don't want to enlarge the result set or filter it. The boost should be independent from the query.
For example I search for a specific tag and want to boost all documents with category 'ACTION' in the result set. But I don't want all documents with category 'ACTION' in the result set and also I don't want only documents with the specific tag AND category 'ACTION'.
I think you need to have Dynamic boosting during query time.
The first matches the id title with boost and second one matches the 'genders' ACTION.
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "id",
"boost": 5
}
}
},
{
"match": {
"content": "Action"
}
}
]
}
}
}
If you want to have multi_match match based on your query:
{
"multi_match" : {
"query": "some query terms here",
"fields": [ "id^5", "genders^3", "tags^0.2" ]
}
}
Note: the ^5 means boost for the title.
Edit:
Maybe you are asking for different types of multi_match queries (at least for ES 5.x) from the ES reference guide:
best_fields
(default) Finds documents which match any field, but uses
the _score from the best field. See best_fields.
most_fields
Finds documents which match any field and combines the _score from
each field. See most_fields.
cross_fields
Treats fields with the same analyzer as though they were one big
field. Looks for each word in any field. See cross_fields.
phrase
Runs a match_phrase query on each field and combines the _score from
each field. See phrase and phrase_prefix.
phrase_prefix
Runs a match_phrase_prefix query on each field and combines the _score
from each field. See phrase and phrase_prefix.
More at: ES 5.4 ElasticSearch reference
I found a solution and it was pretty simple. I use a boosting query. I now just nest the different boosting criteria with and my original query is now the base query.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-boosting-query.html
For example:
{
"query": {
"boosting": {
"positive": {
"boosting": {
"positive": {
"match": {
"director": "Spielberg"
}
},
"negative": {
"term": {
"genres": "DRAMA"
}
},
"negative_boost": 1.3
}
},
"negative": {
"term": {
"tags": "Romantic"
}
},
"negative_boost": 1.2
}
}
}

Elasticsearch fuzzy matching: How can I get direct hits first?

I'm using Elasticsearch to search names in a database, and I want it to be fuzzy to allow for minor spelling errors. Based on the advice I've found on the matter, I'm using "match" and "fuzziness" instead of "fuzzy", which definitely seems to be more accurate. This is my query:
{ "query":
{ "match":
{ "last_name":
{ "query": "Beach",
"type": "phrase",
"fuzziness": 2
}
}
}
}
However, even though I have numerous results with last_name "Beach" (I know there's at least 100), I also get results with last_name "Beech" and "Berch" in the first 10 hits returned by my query. Can someone help me figure out how to get the exact matches first?
Try changing your query to a boolean query with 2 should queries.
The first one being your current query, and then second being a query that only gives exact matches, then give that one a big boost (like 10.0).
That should get your exact matches on top while still listing your partial matches.
I tried to edit "Constantijn" answer above to include sample based on his answer, but still not appearing (pending approval). So, I will just put a sample here instead...
{
"query": {
"bool": {
"should": [
{
"match": {
"last_name": {
"query": "Beach",
"fuzziness": 2,
"boost": 1
}
}
},
{
"match": {
"last_name": {
"query": "Beach",
"boost": 10
}
}
}
]
}
}
}

Resources