Assign custom scores to clauses in boolean query in Elasticsearch - elasticsearch

In Elasticsearch I am testing the following query:
GET sdata/_search
{
"query": {
"bool": {
"must": [
{ "match": { "f1": "sth" } }
],
"should": [
{ "match": { "f2": "sth" } }
]
}
}
}
I know that the overall score of retrieved documents depends on the number of matches they achieve. but is it possible to customize the final score so that the documents that match the should clause may be weighted much more higher than documents that match the must alone? can I add a script to determine how each clause contribute to the final score?
Thank you in advance

You can use a boost parameter along with the should clause
{
"query": {
"bool": {
"must": [
{
"match": {
"f1": "sth"
}
}
],
"should": [
{
"match": {
"f2": {
"query": "sth",
"boost": 10
}
}
}
]
}
}
}

Related

How to filter result set of elasticsearch from another bool condition

I have to fetch data from API which use ElasticSearch.
The conditions of data fetching are firstname should start with given string and company status should be active,
so I have used the below query
"span_first": {
"match": {
"span_term": {
"employee.firstname": "tas"
}
},
"end": 1
}
to match firstname and now i need to filter the data from companyStatus,
"bool": {
"must": [
{
"match": {
"employee.companyStatus": "Active"
}
}
]
}
I'm trying to plug the above bool query into the span_first query
but I have no idea how to do it,
Can someone help me to create the query, sorry if this is a dumb question,
I'm totally new to Elasticsearch.
You can try to use Term Query for filter status and Match Query for search terms.
GET edx_test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"employee.companyStatus": "Active"
}
}
],
"must": [
{
"match": {
"employee.firstname": "tas"
}
}
]
}
}
}
Read more:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
If both the span_first and match query must be true then you can have both the queries in a must clause like below:
GET test_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"employee.companyStatus": "Active"
}
},
{
"span_first": {
"match": {
"span_term": {
"employee.firstname": "tas"
}
},
"end": 1
}
}
]
}
}
}

Convert intervals query to the earlier version that doesn't support it

I have an ES query that was written in a newer version of ES that supports intervals query.
But I want to convert this simple query that has intervals in it to the query to run on the earlier version of 6 that doesn't support intervals
GET /myindex/_search
{
"query": {
"bool": {
"should": [
{
"intervals": {
"title_en": {
"match": {
"query": "title phrase in en",
"max_gaps": -1,
"ordered": true
}
}
}
},
{
"intervals": {
"title_de": {
"match": {
"query": "title phrase in de",
"max_gaps": -1,
"ordered": true
}
}
}
}
],
"minimum_should_match" : 1,
"filter": [
{
"terms": {"status.id": [1,2]}
}
]
}
}
}
I think I should solve it with query_string.
I write something like this(part of it):
{
"query_string": {
"default_field": "title_en",
"query": "\"title phrase in en\"~3"
}
}
But I think it's not the correct solution.
The following query allows getting results similar to intervals.
intervals are replaced with match_phrase and slop is used.
slop value can be configured to allow us to control how many words can be placed between query words.
So query is:
GET /myindex/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title_en": {
"query": "title phrase in en",
"slop": 5
}
}
},
{
"match_phrase": {
"title_de": {
"query": "title phrase in de",
"slop": 5
}
}
}
],
"minimum_should_match" : 1,
"filter": [
{
"terms": {"status.id": [1,2]}
}
]
}
}
}

Give more weight to documents having true in a boolean field

(I use elasticsearch version 2.3.3)
I am doing a simple match query on a text field but now want to give more weight to documents having true in a given boolean field.
My current query is something like
{
"query": {
"match": {
"title": "QUICK!"
}
}
Is that possible?
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "QUICK!"
}
}
],
"should": [
{
"term": {
"my_boolean_field": {
"value": true
}
}
}
]
}
}
}

Elasticsearch boost score with nested query

I have the following query in Elasticsearch version 1.3.4:
{
"filtered": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "adobe creative suite"
}
}
]
}
}
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
},
{
"bool": {
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 5
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 5
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "ajax"
}
},
{
"term": {
"skills.name.original": "html"
}
}
]
}
}
]
}
}
}
Mappings look like this:
skills: {
type: "nested",
include_in_parent: true,
properties: {
name: {
type: "multi_field",
fields: {
name: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
}
}
}
and finally the document structure, for skills (excluded other parts), looks like this:
"skills":
[
{
"name": "java",
"source": [
"linkedin",
"facebook"
]
},
{
"name": "html",
"source": [
"meetup"
]
}
]
My goal with this query is to, first filter out some irrelevant hits with the filters (bottom of the query), then score a person by searching the whole document for the match_phrase "java", extra boosting if it also contains the match_phrase "adobe creative suit", then check the nested value where we get a hit in "skills" to see what kind of "source(s)" the skill came from. Then give the query a boost based on what source, or sources the nested object has.
This kinda of works, at least I don't get any errors, but the final score is odd and its hard to see if its working. If I give a small boost, lets say 2, the score goes DOWN slightly, my top hit at the moment has a score of 32.176407 with boost = 1. With a boost of 5 it goes down to 31.637703. I would expect it to go up, not down? With a boost of 1000, the score goes down to 2.433376.
Is this the right way to do this, or is there a better/easier way? I could change the structure and mappings etc. And why is my score decreasing?
Edit: I have simplified the query a little, only dealing with one "skill":
{
"filtered": {
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
}
],
"minimum_should_match": 1
}
}
]
}
}
],
"should": [
{
"nested": {
"path": "skills",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
}
],
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 1.2
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 1.2
}
}
}
]
}
}
}
}
]
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
}
]
}
}
}
The problem now is that I expect two similar documents, where the only difference is the "source" value on the skill "java". They are "linkedin" and "meetup" respectively. In my new query, they both get the same boost, but the final _score is very different for the two documents.
From the query explanation for doc 1:
"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"
and for doc two:
"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"
These values are the only ones that differ, and I cant see why.
I can't answer the question regarding the boost, but how many shards do you have on index?
TF and IDF are calculated per shard not per index and this could be creating the difference in score.
https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ.
If you reindex with only 1 shard does change the outcome?
Edit: Also, the doc range is the range of docs for each document in the shard and you can use this to calculate IDF for each doc to verify scores.

Using term query with Or operator

I am trying to use the term query the following way!!
{
"query": {
"bool": {
"must": [
{
"term": {
"technology": "Space"
}
},
{
"term": {
"Person": "Steve Simon"
}
}
]
}
}
}
Which returns me a response of feeds which has both fields present in single feed like an intersection operation. Can I use the term query to get UNION result for the above query like, I want all feeds which has space, Steve Simon present individually with feeds which has both present.
Use should instead of must. Also you have to set minimum_should_match to 1 which means that only one should clause is needed for matching a document.
{
"query": {
"bool": {
"should": [
{
"term": {
"technology": "Space"
}
},
{
"term": {
"Person": "Steve Simon"
}
}
],
"minimum_should_match": 1
}
}
}

Resources