Adding boost to Elasticsearch query - elasticsearch

I'm trying to add weight to some results from Elasticsearch.
I'm currently only filtering on an 'active' boolean to grab only the published items:
query: {
filtered: {
query: {
match: {
_all: params[:q]
}
},
filter: {
term: {
active: true
}
}
},
}
I now want to add weight to some of my models. For example, a Market should get a +2 boost. I was trying something like this: (search_type is a field on my results, it's basically the Rails model name)
POST _search
{
"query": {
"function_score": {
"query": {
"match": {
"_all": "hospitality"
}
},
"functions": [
{
"filter": {
"term": {
"active": true
}
}
},
{
"filter": {
"term": {
"search_type": "Market"
}
},
"weight": 2
}
]
}
}
}
However, that does not seem to work: "One entry in functions list is missing a function". So I added "weight": 1 to the active filter.. But now it says it can't parse.
I have no experience with ElasticSearch and the docs are quite confusing. I have also tried using a custom_filters_score thing, but that doesn't seem to work for my version of ES (as described here: http://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/). Another option I tried was combining a boolean query with must and should, but that returned zero results...
Not sure how to proceed. Some insights would be great.

you should be able to use a filtered query alongside function-score to achieve this
Example:
{
"query": {
"filtered": {
"query": {
"function_score": {
"query": {
"match": {
"_all": "hospitality"
}
},
"functions": [
{
"filter": {
"term": {
"search_type": "Market"
}
},
"weight": 2
}
]
}
},
"filter": {
"term": {
"active": true
}
}
}
}
}

Related

How to join two queries in one using elasticsearch?

Hi I want to join two queries in one in elasticsearch, but I don't know how to do it: I think I should do an aggregation but I don't know very clear how to do it. Could you help me? My ES version is 5.1.2.
First filter by status and name:
POST test_lite/_search
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match": {
"STATUS": "Now"
}
},
{
"match": {
"NAME": "PRUDENTL"
}
}
]
}
}
}
}
}
Look for in the filtered records for the word filtered in description:
POST /test_lite/_search
{
"query": {
"wildcard" : { "DESCRIPTION" : "*english*" }
}
}
The only query needed is:
POST test_lite/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"STATUS": "Now"
}
},
{
"match": {
"NAME": "PRUDENTL"
}
},
{"wildcard" : { "DESCRIPTION" : "*english*" }}
]
}
}
}

Function_score query with filters

I need to consider the priority of the company in the search result. The documentation has a similar "Boosting Filtered Subsetsedit" option. In the result, I get this json request and get an error. Perhaps I'm not in the same sequence I put a request for filtering. Tell me in what order it is necessary to write a question considering the filtering.
{
"index":"base",
"type":"info",
"from":0,
"size":10,
"fields":["cID","firm_html"],
"body":
{
"query":
{
"bool":
{
"must_not":[
{
"match":
{
"info.published":"no"
}
}
],
"should":[
{
"query_string":
{
"default_field":
"info.shortname",
"query":"value"
}
},
{
"match":
{
"_all":"value"
}
},
]
},
"function_score":
{
"functions":[
{
"filter":
{
"term":
{
"razm_prio":"0.4"
}
},
"weight":"0.4"
},
],
"score_mode":"sum"
}
}
}
}

Terrible has_child query performance

The following query has terrible performance.
100% sure it is the has_child. Query without it runs under 300ms, with it it takes 9 seconds.
Is there some better way to use the has_child query? It seems like I could query parents, and then children by id and then join client side to do the has child check faster than the ES database engine is doing it...
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"query": {
"term": {
"stage": "es"
}
}
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
Cluster info:
CPU and memory usage is low. It is AWS ES Service cluster (v1.5.2). Many small documents, and since version aws is running is old, doc values aren't on by default. Not sure if that is helping or hurting.
Since "stage" is not analyzed (based on your comment) and, therefore, you are not interested in scoring the documents that match on that field, you might realize slight performance gains by using the has_child filter instead of the has_child query. And using a term filter instead of a term query.
In the documentation for has_child, you'll notice:
The has_child filter also accepts a filter instead of a query:
The main performance benefits of using a filter come from the fact that Elasticsearch can skip the scoring phase of the query. Also, filters can be cached which should improve the performance of future searches that use the same filters. Queries, on the other hand, cannot be cached.
Try this instead:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"source": "IntegrationTest-2016-03-01T23:31:15.023Z"
}
},
{
"range": {
"eventTimestamp": {
"from": "2016-03-01T20:28:15.028Z",
"to": "2016-03-01T23:33:15.028Z"
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "s3"
}
}
}
},
{
"has_child": {
"type": "status",
"filter": {
"term": {
"stage": "es"
}
}
}
}
]
}
}
}
},
"aggs": {
"digests": {
"terms": {
"field": "digest",
"size": 0
}
}
},
"size": 0
}
I bit the bullet and just performed the parent:child join in my application. Instead of waiting 7 seconds for the has_child query, I fire off two consecutive term queries and do some post processing: 200ms.

How to do nested AND and OR filters in ElasticSearch?

My filters are grouped together into categories.
I would like to retrieve documents where a document can match any filter in a category, but if two (or more) categories are set, then the document must match any of the filters in ALL categories.
If written in pseudo-SQL it would be:
SELECT * FROM Documents WHERE (CategoryA = 'A') AND (CategoryB = 'B' OR CategoryB = 'C')
I've tried Nested filters like so:
{
"sort": [{
"orderDate": "desc"
}],
"size": 25,
"query": {
"match_all": {}
},
"filter": {
"and": [{
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"progress": "incomplete"
}
}, {
"term": {
"progress": "completed"
}
}]
}
}
}, {
"nested": {
"path":"hits._source",
"filter": {
"or": [{
"term": {
"paid": "yes"
}
}, {
"term": {
"paid": "no"
}
}]
}
}
}]
}
}
But evidently I don't quite understand the ES syntax. Is this on the right track or do I need to use another filter?
This should be it (translated from given pseudo-SQL)
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query":
{
"filtered":
{
"filter":
{
"and":
[
{ "term": { "CategoryA":"A" } },
{
"or":
[
{ "term": { "CategoryB":"B" } },
{ "term": { "CategoryB":"C" } }
]
}
]
}
}
}
}
I realize you're not mentioning facets but just for the sake of completeness:
You could also use a filter as the basis (like you did) instead of a filtered query (like I did). The resulting json is almost identical with the difference being:
a filtered query will filter both the main results as well as facets
a filter will only filter the main results NOT the facets.
Lastly, Nested filters (which you tried using) don't relate to 'nesting filters' like you seemed to believe, but related to filtering on nested-documents (parent-child)
Although I have not understand completely your structure this might be what you need.
You have to think tree-wise. You create a bool where you must (=and) fulfill the embedded bools. Each embedded checks if the field does not exist or else (using should here instead of must) the field must (terms here) be one of the values in the list.
Not sure if there is a better way, and do not know the performance.
{
"sort": [
{
"orderDate": "desc"
}
],
"size": 25,
"query": {
"query": { #
"match_all": {} # These three lines are not necessary
}, #
"filtered": {
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "progress"
}
}
},
{
"terms": {
"progress": [
"incomplete",
"complete"
]
}
}
]
}
},
{
"bool": {
"should": [
{
"not": {
"exists": {
"field": "paid"
}
}
},
{
"terms": {
"paid": [
"yes",
"no"
]
}
}
]
}
}
]
}
}
}
}
}

Why script in custom_filters_score behaves as boost?

{
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [
{
"filter": {
"term": {
"subject": "math"
}
},
"script": "_score + doc['subject_score'].value"
}
]
}
}
}
If script is having like above it gives Error: unresolvable property or identifier: _score
If script is like "script": "doc['subject_score'].value" It multiplies the _score in similar way boost does. I want to replace the elasticsearch _score with custom score.
If I understood you correctly you would like to use elasticsearch scoring if subject is not math and you would like to use custom scoring with subject is math. If you are using Elasticsearch v0.90.4 or higher, it can be achieved using new function_score query:
{
"query": {
"function_score": {
"query": {
"term": {
"name": "user1234"
}
},
"functions": [{
"filter": {
"term": {
"subject": "math"
}
},
"script_score": {
"script": "doc[\"subject_score\"].value"
}
}, {
"boost_factor": 0
}],
"score_mode": "first",
"boost_mode": "sum"
}
}
}
Prior to v0.90.4 you would have to resort to using combination of custom_score and custom_filters_score:
{
"query": {
"custom_score": {
"query": {
"custom_filters_score": {
"query": {
"term": {
"name": "user1234"
}
},
"filters": [{
"filter": {
"term": {
"subject": "math"
}
},
"script": "-1.0"
}]
}
},
"script": "_score < 0.0 ? _score * -1.0 + doc[\"subject_score\"].value : _score"
}
}
}
or as #javanna suggested, use multiple custom_score queries combined together by bool query:
{
"query": {
"bool": {
"disable_coord": true,
"should": [{
"filtered": {
"query": {
"term": {
"name": "user1234"
}
},
"filter": {
"bool": {
"must_not": [{
"term": {
"subject": "math"
}
}]
}
}
}
}, {
"filtered": {
"query": {
"custom_score": {
"query": {
"term": {
"name": "user1234"
}
},
"script": "doc['subject_score'].value"
}
},
"filter": {
"term": {
"subject": "math"
}
}
}
}]
}
}
}
Firstly I'd like to say that there are many ways of customising the scoring in elasticsearch and it seems like you may have accidentally picked the wrong one. I will just summarize two and you will see what the problem is:
Custom Filters Score
If you read the docs (carefully) on custom_filters_score then you will see that it there for performance reasons, to be able to use for scoring the the faster filter machinery of elasticsearch. (Filters are faster as scoring is not calculated when computing the hit set, and they are cached between requests.)
At the end of the docs; it mentions custom_filters_score can take a "script" parameter to use instead of a "boost" parameter per filter. Best way to think of this is to calculate a number, which will be passed up to the parent query to be combined with the other sibling queries to calculate the total score for the document.
Custom Score Query
Reading the docs this is used when you want to customise the score from the query and change it how you wish. There is a _score variable available to you to use in your "script" which is the score of the query inside the custom_score query.
Try this:
"query": {
"filtered": {
"query": {
"custom_score": {
"query": {
"match_all": {}
},
"script": "doc['subject_score'].value" //*see note below
}
},
"filter": {
"and": [
{
"term": {
"subject": "math"
}
},
{
"term": {
"name": "user1234"
}
}
]
}
}
}
*NOTE: If you wanted to you could use _score here. Also, I moved both your "term" parts to filters as any match of a term would get the same score and filters are faster.
Good luck!

Resources