ElasticSearch Ignoring words having one single letter - elasticsearch

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.

If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

Related

Elasticsearch - Impact of adding Boost to query

I have a very simple Elastic query mentioned below.
{
"query": {
"bool": {
"must": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"tag": {
"query": "Audience: PRO Brand: Samsung",
"boost": 3,
"operator": "and"
}
}
},
{
"match": {
"tag": {
"query": "audience: PRO brand samsung",
"boost": 2,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
I want to know if I add a boost in the query, will there be any performance impact because of this, and also will boosting help if you have a very large data set, where the occurrence of a search word is common.
Elasticsearch adds boost param with default value, IMO giving different value won't make much difference in the performance, but you should be able to measure it yourself.
Reg. your second question, adding boost definitely makes sense where the occurrence of your search words are common, this will help you to find the relevant document. for example: suppose you are searching for query in a index containing Elasticsearch posts(query will be very common on Elasticsearch posts), but you want the give more weight to documents which have tag elasticsearch-query. Adding boosts in this case, will provide you more relevant results.

How to rank ElasticSearch documents based on scores

I have an Elastic search index that contain thousands of documents, each document represent a user.
each document has set of fields (is_verified: boolean, country: string, is_creator: boolean), also i have another service that call ES search to lookup for documents, how i can rank the retrieved documents based on those fields? for example a verified user with match should come first than un verified one.
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
what shall i read/look to understand how to rank in elastic search.
thanks
I guess the sorting function mentioned by Mikael is pretty straight forward and should cover your use cases. Check Elastic Doc for more information on that.
But in case you want to do really fancy sorting, maybe you could use a bool query and different boost values to set your desired relevancy for each matched field. It tried to come up with a real life example, but honestly didn't find one. For the sake of completeness, he following snippet should give you an idea how to achieve similar results as with the sort API (but still, i would prefer using sort).
GET /yourindexname/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Monica"
}
}
],
"should": [
{
"term": {
"is_verified": {
"value": true,
"boost": 2
}
}
},
{
"term": {
"is_creator": {
"value": true,
"boost": 2
}
}
}
]
}
}
}
is there some kind of document scoring while indexing the documents ? if yes can i modify it based on my criteria ?
I wouldn't assign a fixed score to a document while indexing, as the score should be dependent on the query. However, if you insist to have a predefined relevancy for each document, theoretically you could add a field relevancy having that value for ordering and use it later in the query:
GET /yourindexname/_search
{
"query" : {
"match" : {
"name": "Monica"
}
},
"sort" : [
{
"relevancy": {
"order": "desc"
},
"_score"
}
]
}
You can consider using the Sort Api inside your search queries ,In example below we used the search on the field country and sorted the result with respect of Boolean field (is_verified) , You can also add the other Boolean field inside Sort brackets .
GET /yourindexname/_search
{
"query" : {
"match" : {
"country": "Iceland"
}
},
"sort" : [
{
"is_verified": {
"order": "desc"
}
}
]
}

Give more score to documents that contains all query terms

I have a problem with scoring in elasticsearch. When user enter a query that contains 3 terms, sometimes a document that has two words a lot, outscores a document that contains all three words. for example if user enters "elasticsearch query tutorial", I want documents that contains all these words score higher than a document with a lot of "tutorial" and "elasticsearch" terms in it.
PS: I am using minimum should match and shingls in my query. also they made ranking a lot better, they did not solve this problem completely. I need something like query coordination in lucene's practical scoring function. is there anything like that in elastic with BM-25?
One of the possible solutions could be using function score:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "match": { "title": "elasticserch" } },
"weight": 1
},
{
"filter": { "match": { "title": "tutorial" } },
"weight": 1
}
],
"score_mode": "sum"
}
}
}
In this case, you would have clearly a better position for documents with more matches. However, this would completely ignore TF-IDF or any other parameters.

Elasticsearch: Get documents which have minimum matching percentage

Consider I have following two documents indexed:
[
{
"name": "John Doe"
},
{
"name": "John A"
}
]
The match percentage of the word John is 50 and 66.7 with the field name of the first and second document respectively.
Now the question is, how can I find all the matches, where the match percentage is more than X, where 0<=X<=100. Match should be always prefix match.
The only way I see to do it is the use of a script query in a filter to determine a minimum length of the field (you can calculate it with your percentage and your term length):
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
// Your name: 'John' match
{
"script": {
"script": {
"params": {
"min_size": 4
},
// In ES <5.6 versions, use "inline" instead of "source"
"source": "doc['name'].values.length() > params.min_size"
}
}
}
]
}
}
}
}
}
But you will have to enable fielddata on your field.
While you can build something like this with scripting (as Julien TASSIN describes), this is not what you want:
Unless you have a filter criteria or very little data, this will be slow, since Elasticsearch needs to do some heavy calculations for every search.
Elasticsearch generally operates on tokens. While you can do a lot of things with scripting, your use case sounds like you are either using it wrong or Elasticsearch is probably not a great fit; though I don't know any other system that would work very well for this specific requirement.

How to limit the results in a multi match query?

i had used multi match phrase when I make search. However I have to put limit result of all math phrase seperately. I mean, I want to take only 2 result for each multi match. I can't find any limit/size attributes. Do you know any solution?
Example Code:
"query": {
"bool": {
"should": [
{
"match_phrase": {
"text": {
"query": " Home is clear and big ",
"slop": 2
}
}
},
{
"match_phrase": {
"text": {
"query": "365 different company use our system in test",
"slop": 2
}
}
}
]}}
use
{"limit" : 3, "from":0, "query": ...}
The simplest solution is to make to individual searches for each of the conditions. The size parameter can be set to retrieve only the first 2 results for each query.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
The boolean should query will not distinguish which condition has been satisfied: it returns documents for which at least one of the two conditions holds. The scores for the two matches will be combined into a single score but it will be impossible to tell which s

Resources