Elasticsearch: obtaining each field score in the same document - elasticsearch

Assuming I have a document with three fields: name, company, email each one mapped with edge-ngram
{
"name": "John",
"company": "John's company",
"email": "johndoe#gmail.com"
}
When searching for "john" I want to be able to get each field score individually
{
"query": {
"bool": {
"should": [
{ "match": { "name": "john" }},
{ "match": { "company": "john" }},
{ "match": { "email": "john" }}
]
}
}
}
In this example the score from each match clause is added together, then divided by the number of match clauses. So is there anyway to obtain the score from each match clause individually not just the final score for the whole document?
I think setting "explain": true is also not ideal since it provides very low-level details of scoring (inefficient and difficult to parse).

I cannot think of a way that you could do this without modifying the search results.
However if you were to use a different boost on each field you might be able to reverse your way into determining the value of each. For instance boosting one field by 1 the next by 10 and the final by 100, and examining the final number might give you what you are looking for, however the field boosted by 100 will be the only one that matters.
Curious the application of this, as it seems boosting in general might solve what you are looking for.

Related

Elastic Search comparing sentences with synonyms?

Is there an api within elastic search to compare the following two sentences?
The weather is great
The climate is good
The search described here https://www.elastic.co/guide/en/elasticsearch/guide/2.x/practical-scoring-function.html doesn't work since the sentences have largely different words
The following query will give you the score that would be computed by elasticsearch. Replace test by the name of your index and field by the name of the field using the correct analyzer.
{
"script": {
"source": "_score"
},
"context": "score",
"context_setup": {
"index": "test",
"query": {
"match": {
"field": "The weather is great"
}
},
"document": {
"field": "The climate is good"
}
}
}
You will not get a score between 0.5 and 1 though. Elasticsearch is not built to perform pairwise string comparison, it is used to search within a collection of documents.
If you really want to get a score between 0.5 and 1 you will have to write a scripted similarity function
But again, I don't think elasticsearch fit with your usecase.

Give more score to documents that contains all query terms

I have a problem with scoring in elasticsearch. When user enter a query that contains 3 terms, sometimes a document that has two words a lot, outscores a document that contains all three words. for example if user enters "elasticsearch query tutorial", I want documents that contains all these words score higher than a document with a lot of "tutorial" and "elasticsearch" terms in it.
PS: I am using minimum should match and shingls in my query. also they made ranking a lot better, they did not solve this problem completely. I need something like query coordination in lucene's practical scoring function. is there anything like that in elastic with BM-25?
One of the possible solutions could be using function score:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "match": { "title": "elasticserch" } },
"weight": 1
},
{
"filter": { "match": { "title": "tutorial" } },
"weight": 1
}
],
"score_mode": "sum"
}
}
}
In this case, you would have clearly a better position for documents with more matches. However, this would completely ignore TF-IDF or any other parameters.

ElasticSearch Ignoring words having one single letter

I'm a beginner in ElasticSearch, I have an application that uses elasticSearch to look for ingredients in a given food or fruit...
I'm facing a problem with scoring if the user for example tapes: "Vitamine d"
ElasticSearch will give the "vitamine" phrase that has the best scoring even if the phrase "Vitamine D" exists and normally it should have the highest score.
I see that if the second word "d" in my case is just one letter then elastic search will ignore it.
I did another example: "vitamine b12" and I had the correct score.
Here is the query that the application send to the server:
{
"from": 0,
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"constNomFr": {
"query": "vitamine d"
}
}
}
],
"should": [
{
"prefix": {
"constNomFr": {
"value": "vitamine d",
"boost": 2
}
}
}
]
}
},
"_source": {
"excludes": [
"alimentDtos"
]
}
}
What could I modify to make it work?
Thank you so much.
If you can identify your ingredients, I recommend you to index them on a separate field "ingredients" setting it's type to keyword. This way you can use a term filter and you can even run aggregations.
You may already have your documents indexed that way, in that case if your are using the default mapping, just run your query against your_field_name.keyword.
If you don't have your ingredients indexed as an array then you should take a look to the elasticsearch analyzers to choose or build the right one.

Elasticsearch: Get documents which have minimum matching percentage

Consider I have following two documents indexed:
[
{
"name": "John Doe"
},
{
"name": "John A"
}
]
The match percentage of the word John is 50 and 66.7 with the field name of the first and second document respectively.
Now the question is, how can I find all the matches, where the match percentage is more than X, where 0<=X<=100. Match should be always prefix match.
The only way I see to do it is the use of a script query in a filter to determine a minimum length of the field (you can calculate it with your percentage and your term length):
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
// Your name: 'John' match
{
"script": {
"script": {
"params": {
"min_size": 4
},
// In ES <5.6 versions, use "inline" instead of "source"
"source": "doc['name'].values.length() > params.min_size"
}
}
}
]
}
}
}
}
}
But you will have to enable fielddata on your field.
While you can build something like this with scripting (as Julien TASSIN describes), this is not what you want:
Unless you have a filter criteria or very little data, this will be slow, since Elasticsearch needs to do some heavy calculations for every search.
Elasticsearch generally operates on tokens. While you can do a lot of things with scripting, your use case sounds like you are either using it wrong or Elasticsearch is probably not a great fit; though I don't know any other system that would work very well for this specific requirement.

Calculate counts of hits of several subqueries inside one query to Elasticsearch

I have 3 fields in a document that I need to match. I'd like to identify which of those 3 fields have any matches.
More specifically, I'd like to find out if the given wildcard query matches only one field through the document set or matches several fields. If the wildcard query matches only, say field1, then I can make a conclusion that the given wildcard query is applicable to only field1. If the wildcard query matches two or three fields, then I cannot make such a conclusion and I'll wait for more characters to be entered by user to narrow search.
I've written the following query that matches all 3 fields:
{
"query": {
"bool": {
"should": [
{"wildcard": { "field1": "*R*" }},
{"wildcard": { "field2": "*R*" }},
{"wildcard": { "field3": "*R*" }}
]
}
},
"size": 0
}
It returns the total count of all documents that have matches on any of those fields. Now I'd like to know if it's possible to receive 3 separate counts for each subquery. This can be achieved by sending 3 separate requests but I'd like to minimize the number of requests to elasticsearch.
I've tried bool and dis_max queries but could not find a solution.
UPDATE
Using named queries I've built the following query:
{
"query": {
"bool": {
"should": [
{"wildcard": { "field1": { "value": "*R*", "_name": "query1" }}},
{"wildcard": { "field2": { "value": "*R*", "_name": "query2" }}},
{"wildcard": { "field3": { "value": "*R*", "_name": "query3" }}}
]
}
},
"size": 1
}
This query returns a single result with the best score. By default, the score is higher when more fields are matched in the same document. So if the found document was matched by two or three fields, it already answers my initial question. However, if the found document was matched by a single field, say, field1, it does not guarantee there are no other documents that are matched by field2 or field3, so it's still not a solution.
Do I have to send 3 requests to run searches over each field separately to solve my problem?

Resources